Skip to main content.
home | support | download

Back to List Archive

Re: Bolding search items on indexed page

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Tue Nov 13 2001 - 00:13:24 GMT
At 03:36 PM 11/12/01 -0800, SRE wrote:
>But wait - doesn't the index already contain a word position
>of some sort? That would reduce the problem to just counting
>words properly, not all the stemming and matching.

I don't know the search.c code enough to answer fully.  The word position
in the index is just a relative number, so if swish told you that it found
word "foo" at position 232, that wouldn't tell you much about where it is
in the source document.  You can imagine how much swish would print out for
results if it told you what words it found and at what positions.

BTW, this is off topic, but I've always thought it would be nice to have
this output:

> ./swish-e -w foo bar -m 1
# SWISH format: 2.1-dev-24
# Search words: foo bar
# Word: foo 2322
# Word: bar 1231
# Number of hits: 123
# Search time: 0.000 seconds
# Run time: 0.005 seconds
..

I'd also like:
# SWISH format: 2.1-dev-24
# Search words: foo baz
# Word: foo 2322 files
# Word: bar 0 files
err: no results

So I could see which words caused the query to fail.

BTW, in case anyone ever noticed:

> ./swish-e -w foo-bar -H 9  
# WordCharacters: 
# IgnoreFirstChar: 
# IgnoreLastChar: 
# StopWords:
# BuzzWords:
# Search Words: foo-bar
# Parsed Words: foo bar 

All those headers have found their way into swish because of the needs of
highlighting.  I use Parsed Words to know what words swish is using, and
the WordCharacters and Ignore*Char to split up the source text, and
StopWords to know what to ignore.


Bill Moseley
mailto:moseley@hank.org
Received on Tue Nov 13 00:13:55 2001