| Why optimize?
First, why would anyone want to search engine optimize their
PDF files? Well, if you had an eBook, brochure, product description
or technical document in PDF format, you may wish to optimize
these to pick up some extra search engine traffic.
Can the search engines read PDF files?
Yes, most of the major search engines now can read the basic
contents of PDF files, though getting these pages to rank
as well as HTML files is still questionable.
How is it supposed to work?
This is how the workflow is supposed to work. Create your
file in MS Word, or in a draw or page layout program that
later can be distilled into a PDF (with some applications
you will have to create an EPS file first and then distill
it and with other applications, you can distill right out
of the apps). If you are using a program such as MS Word,
be mindful to apply the H1, H2, H3 tags where necessary and
optimize the body text as you would an HTML file.
When you are finished, distill the file. Bring this file
into the full version of Adobe Acrobat 6 for editing. Plug
in the appropriate content, post the PDF on your website and
let the search engine robots index the file.
How do I plug in the appropriate content?
In Adobe Acrobat 6 there are two places to input content
into a PDF file. The first place is under File / Document
Properties and the second place is under Advanced
/ Document Metadata. Under File / Document Properties
there are several menus but the most relevant for our purposes
is the Description menu. Under the Description menu, there
are fields for Title, Author, Subject and Keywords.

Now to confuse matters more, lets go over to the Advanced
/ Document Metadata menu. There are a couple of choices here,
but lets once again look at the Description menu. Under
this Description menu, there are fields for Title, Author,
Description, Description Writer, Keywords, Copyright State,
Copyright Notice and Copyright Info URL.

How does the PDF store the data?
With duplicate fields, it is important to find out how the
data is stored so that we may make some educated guesses as
to how the search engines read this data. I performed a few
small experiments and here is what I have found. The Title
and Author fields seem to be linked to each other because
when you change one and check on the other you will see it
too has changed. Also, the Subject field of the Document Properties
menu seems to be linked to the Description field of the Document
Metadata menu for the same reasons. The Keyword fields, however,
are not linked. Separate sets of keywords can be added to
both fields. When the file is saved, both sets of keywords
are stored in the PDF file.
Which set of keywords is correct then?
Adobe stores its metadata in XML format. Opening the PDF
file in Notepad, it appears that the Keyword field under Document
Properties is the one that the search engines will use (this
hasnt been proven, yet though). The keywords input into
this field appear in the PDF as we have come to expect, separated
by commas, like this: Keywords(movies, cinemas, matinees,
theatres, popcorn).
The keywords that were input into the Document Metadata menu
appear as a sort of list like this: <rdf:li>trees</rdf:li><rdf:li>wood</rdf:li><rdf:li>chips</rdf:li>
Of course, this doesnt mean anything really
it is how the search engines read this that counts.
How does it really work?
Ive run some preliminary tests (and by this I mean
very preliminary) and more testing will need to be completed
to verify these results, but here is what I have come up with
so far. When a PDF file was first opened in Acrobat 6 the
Document Properties or Document Metadata title and author
fields were already filled in with the file name and authors
initials (information received from MS Word)
Without filling in any extra data into the Document Properties
or Document Metadata menu, Google used the Title field information
for the title in the results and the description in the results
was acquired from the body copy. Yahoo!, in older PDFs
use the largest text on the page as the title text. In regards
to more recently indexed PDF documents, however, Yahoo! is
using the Title field information as the title text in the
search results. At this writing, the description text in the
search engine results comes from the body text of the PDF
and not the Document Properties or Document Metadata text.
Thinking I might just get lucky (and hoping for quick results),
I ran a few optimized and non-optimized PDFs through
some of the more popular search engine spider simulators on
the web, but these spiders did not handle the binary code
very well. None of them returned title or meta tag information
and the most popular keywords were snippets of binary code.
So, at this point, does it really pay to optimize a PDF?
The simple answer is, yes. The title tag and body copy can
still be optimized and the major search engines will index
it accordingly. As far as the Keywords and Description meta
tags, well Google ignores this in PDFs just as it does
in HTML documents and Yahoo!, which does use the description
tag, is only half way to where it needs to be.
But Google and Yahoo! arent the only two search engines
/ directories around and with algorithms changing all the
time, perhaps someday soon either the SEs will be able
to fully read a PDF file or Adobe will offer a patch that
will make PDFs more SE-friendly. Its only a matter
of time, my friend. Will you be ready?
|