Skip to main content.
home | support | download

Back to List Archive

RE: Hit-highlighting of PDF files

From: Eric Jobidon <eric(at)not-real.neopaper.net>
Date: Wed Jun 28 2006 - 20:30:02 GMT
> Is an option, but we decided to de-couple PDF 
> hit-highlighting from any search engine so we have more 
> flexibility in switching search engines (even make PDF 
> hit-highlighting possible in Google, Yahoo, MSN, ...).
> 

Humm... A search engine independent solution! That's an interesting idea. So
you actually keep a 2nd file (.LST) for *each* pdf you include in the index?
Of course this forces one to double the required disk space and maintain the
"DB" (the .LST files), but the flexibility it provides is definitely an
asset.

Have you considered using an off the shelf database (i.e. MySQL) instead of
the .LST files? I'm not sure it would be a good idea, but the PDF files I
will be indexing are huge (hundreds of MB each) and I am concerned with the
access time of doing 2 searches for each user query (first in swish and then
for the LST lookup). Any thoughts?

Thanks for the feedback!

Eric Jobidon
NeoPaper.net
Received on Wed Jun 28 13:30:07 2006