Hit-highlighting of PDF files

From: Eric Jobidon <eric(at)>
Date: Wed Jun 28 2006 - 00:36:46 GMT
 Has anyone tried to perform hit-highlighting of PDF files using Swish-E?

What I am referring to is the creation of a "pseudo-xml" file (as specified
by Adobe at
to highlight all the searched words in the selected PDF file (not in the
search results, but in the pdf file itself). The file requires specifying
page number and character offsets from the top of the page to have Acrobat
highlight the words.

So is there a way to query the index file to obtain the list of all
occurrences of a word in a file (like a regular search), but also obtain the
character offsets of each of those occurrences from the beginning of the
page in that document?
Customizing the code is an option, but I'm hoping someone else has done
something similar that I could get inspired from.
Also, I couldn't find the documentation on the format of the index, but is
there a mechanism to associate arbitrary binary data for each word included
in the index? That way I could store the page and character offset of each
word of a file.
Thanks for any pointer/suggestions!
Eric Jobidon

