Skip to main content.
home | support | download

Back to List Archive

Re: Question on how Swish-e is parsing words out of a

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Dec 18 2003 - 06:13:44 GMT
On Wed, Dec 17, 2003 at 09:51:46PM -0800, David Wood wrote:
> For the HTML doc content listed at the bottom of this message, if I run:

> /opt/swish-e/bin/swish-e -T PARSED_WORDS -v 3 -i blah.html -f blah.idx
> White-space found word 'product_families=Monitors,Desktop'

> Why is Swish-e finding the "words" listed above, for example, 
> 'product_families=Monitors,Desktop'?  Neither '_' nor '=' is in WORDCHARS, 
> so those strings should be getting broken into component words, shouldn't they?a

Try -T indexed_words instead of -T parsed_words.  All that's doing is
showing "White-space" delimited words.

Swish splits the text into white-spaced "words", then it uses
WordCharacters and other tests to split up those words into what gets
indexed.


-- 
Bill Moseley
moseley@hank.org
Received on Thu Dec 18 06:13:51 2003