Skip to main content.
home | support | download

Back to List Archive

SWISH::API and highlighting

From: Jonas Wolf <JOWOLF(at)not-real.uk.ibm.com>
Date: Wed Jul 14 2004 - 14:52:32 GMT
I have a problem. When I use a fuzzy index (Stemming_en), the 
SWISH::PhraseHighlight module does not work as expected.

Essentially, after creating the $swish object the script gets the $results 
of a query. Then, it executes the following, as detailed in search.cgi.

my %headers = map { lc($_) => ($swish->HeaderValue( $index, $_ )||'') } 
$swish->HeaderNames;
my $highlighter = SWISH::PhraseHighlight->new( \%highlight_settings, 
\%headers );
my %parse_query = parse_query( join ' ', $results->ParsedWords( $index ) 
);
my $phrases = $parse_query{$metaname};
$highlighter->highlight( \$text, $phrases);

which highlights the apropriate pieces of text in $text. This works fine 
for non-stemmed indexes. So far so good.

For fuzzy indexes, this does not work. And I have found the cause of the 
problem. When I execute from the command line

X:\cgi-bin\search\modules>swish-e -H 9 -f /path/to/stemmed/index -w memory
# SWISH format: 2.4.2
# Search words: memory
#
# Index File: /path/to/stemmed/index
# ... lots of headers we don't care about
# Fuzzy Mode: Stemming_en
# Search words: memory
# Parsed Words: memori

then we can see why. It appears that the parsed words are the stemmed 
versions of the actual search terms. These, if passed to parse_query do 
not match the original search terms, which we obviously want to highlight 
as well. The SWISH::FuzzyWord method does not help either as we do not 
have the original search terms anymore.

I have checked the archive for anything about this, but could not find 
anything. Has anyone experienced this before, and has a solution? Or am I 
doing something wrong?

Thanks, Jonas
Received on Wed Jul 14 07:52:47 2004