> Is an option, but we decided to de-couple PDF
> hit-highlighting from any search engine so we have more
> flexibility in switching search engines (even make PDF
> hit-highlighting possible in Google, Yahoo, MSN, ...).
Humm... A search engine independent solution! That's an interesting idea. So
you actually keep a 2nd file (.LST) for *each* pdf you include in the index?
Of course this forces one to double the required disk space and maintain the
"DB" (the .LST files), but the flexibility it provides is definitely an
Have you considered using an off the shelf database (i.e. MySQL) instead of
the .LST files? I'm not sure it would be a good idea, but the PDF files I
will be indexing are huge (hundreds of MB each) and I am concerned with the
access time of doing 2 searches for each user query (first in swish and then
for the LST lookup). Any thoughts?
Thanks for the feedback!
Received on Wed Jun 28 13:30:07 2006