As search engines begin to crawl many different file formats besides HTML, you might consider optimizing your PDF documents for better search engine visibility. Many search engines now include image, video and PDF files, as well as spreadsheets, PowerPoint presentations, and more.

  • Google searches 12 main file types in addition to HTML, the most common being PDF, PostScript, and Microsoft Office formats.

  • FAST AllTheWeb indexes images, videos, PDF, MP3, and FTP files and eventually plans to index Flash and other multimedia files.

  • AltaVista recently expanded its multimedia index to include news photos, maps, charts, and diagrams, as well as image, audio and video files.

Search engine users and researchers are finding a lot more PDF files than ever before, which can be useful for those looking for government and technical documents. However, there can be problems when accessing the various file formats as many of these files were not prepared with search engine robots in mind.

Limitations of Alternate File Formats

While it's nice to have many different file formats available, they may not be as convenient to access as HTML. Some file formats aren't always safe to access. They can take time to download and require special software. Below are the limitations.

  • Slow Download Time: Users gathering information are in a hurry. Yet PDF files can be lengthy and contain numerous graphics. They can be hard to read on the screen. Users with dial-up modems may balk at the time it takes to download. This is also a problem with audio, video, and Flash files.

  • Software Requirements: Users may not take the time to download the special plug-ins required to view your documents. Some users with older operating systems and software may have problems with current word processing and spreadsheet documents.

  • Virus Risk. Along with the convenience of alternate file formats comes the vulnerability to pernicious new viruses that might be lurking in these files. You must have adequate virus protection and also exercise caution when downloading or opening files from unknown sources.

  • Security Hazards. Confidential company data in spreadsheets and other files could be vulnerable to outsider curiosity, creating a security breach. Protect your confidential data with password protection on a secure server. You can also create a robots.txt file to disallow search engines from indexing specific files and directories containing confidential data.

Google has anticipated some of these problems and provides an option to view the documents as HTML. Users selecting this option can view the document in HTML (without formatting) on Google's server. This gives you a rough idea of the document content so you can decide whether or not it's worth downloading.

Optimizing Alternative-File-Format Documents

As the major search engines show more and more documents in various file formats in their search results, it's a good idea to start optimizing the alternative-file-format documents that you want indexed. Don't include those that could cause user frustration (use the robots.txt file).

Title and Description Tags: A good way to start is to ensure that you have a meta title and description tag for every file. The document title is one of the first things search engine spiders encounter when indexing your pages. If they don't find one, they'll create one that may not be as advantageous as your own.

Sometimes you can embed meta information that is read by search engines when creating your files in the various file formats. For instance, MS Word lets you select a document title when you create the file. If you don't select a title, the first words to appear can be saved as the title, but this may not be your best choice. Keep in mind that title selection is possible in MS Word, PowerPoint, Adobe Acrobat 5.0, and other software programs.

It's wise to provide a good description for PDF files because many users hesitate to download them unless they really want the data. This can be done by creating an HTML version, which can then be converted to PDF. There are free tools available that will perform the conversion for you. HTML Doc 1.8.20 from Easy Software Products is an example of such tools, but there are others. This way, you can create a good title and description that will be picked up by search engines while taking advantage of PDF format for document delivery.

Optimizing Image Files: Another good strategy is to index your image files, which is simply another good opportunity to promote your site without much extra effort.

  • Image File Names: Be descriptive when assigning file names to your image files, using keywords if possible. This helps with general maintenance and can also boost your keyword density.

  • ALT Tags: When you write text descriptions for your graphic images, make it meaningful and use complete sentences with strategic keywords rather than list a string of keywords.

  • Page Content: The content surrounding the images on the page should also be keyword-rich and relevant to the image. This helps search engines put the image into context as the robot spider reads surrounding text when indexing images. So integrate your ALT tag descriptions with the surrounding page text.
In order to get more of your content indexed, you might give serious consideration to identifying which alternative-file-format documents to optimize. Go over the limitations and decide whether or not the download is worth the effort. To avoid user frustration, ensure the document contains valuable, current information and optimize with relevant descriptions.
July 17, 2002

Paul J. Bruemmer has provided search engine marketing expertise and consulting services to prominent American businesses since 1995. As Director of Search Marketing at Red Door Interactive, he is responsible for strategizing and implementing search engine marketing activities within Red Door's Internet Presence Management (IPM) services.

Comments closed after 30 days to combat spam.

Search Engine Guide > Paul Bruemmer > Optimizing Your PDF Documents