Skip to main content.
home | support | download

Back to List Archive

RE: Context

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Aug 27 2003 - 20:07:51 GMT
On Wed, Aug 27, 2003 at 12:44:38PM -0700, Peter Holm wrote:
> On Wed, 27 Aug 2003 07:06:25 -0700 (PDT), you wrote:
> 
> >And the place to look is the swish.cgi script
> 
> ahh, so that "context"-feature this is realized in the
> example-cgi-script? Ok, that means, I must reimplement it, because I
> am using swish-e directly with a php-script.

Yes, if it's PHP that's something you will have to do.

> Wouldn´t it be more effective to let the searchengine itself return
> the context from the indexed words? Don´t know, if I am talking any
> stupid things here...

Swish-e builds a reverse index.  You ask for a word and it tells you 
what file it's in.  It would require a bit of work to go through that 
reverse index and rebuild the original document.  Imagine trying to 
rebuild an entire book from just the index.

It's possible that swish-e could tell you what word positions for each
word/document, but that could be thousands of positions for a single
document, not to mention that the word positions probably wouldn't match
up to real word positions because of the way swish numbers word
positions.  And then there's boolean matching, phrase searches, and 
wildcards.

Not too long ago there was an article in something like Sysadmin where a
search engine was built by indexing individual words in a MySQL table. 
You could then select all words for a given file and sort by word
position to get the original document back -- well not the original
document, but a list of words in the right order.  I assume that would
be somewhat slow.



-- 
Bill Moseley
moseley@hank.org
Received on Wed Aug 27 20:08:00 2003