At 06:01 PM 05/30/01 -0700, Greg Caulton wrote:
>One question, has anyone written any perl that can provide me with a sample
>from the document (similar to Google's output)?
Yes. There will be an example CGI script in the source distribution that will
provide some form of term highlighting, and context output (showing a few
words on either side of the matched term(s)).
Don't get too excited, as there are some open issues. The example script
will use swish-e's "StoreDescription" feature. This feature can be used to
store a few hundred characters of a document in the index file. I'm not
clear on the implications of storing the entire contents of every document
within the index. It would probably be better to store the (extracted)
text from the documents in another database (e.g. Berkeley DB). This could
be especially useful if you use stemming in your index, since the DB could
contain both the original text, and the text in stemmed format for matching
against the query. Currently, swish does not report back which specific
word matched in a document.
And there's minor details such as the example script probably will not
highlight phrases correctly.
The script should be available in a few days.
Received on Thu May 31 07:31:15 2001