On Wed, Dec 17, 2003 at 09:51:46PM -0800, David Wood wrote:
> For the HTML doc content listed at the bottom of this message, if I run:
> /opt/swish-e/bin/swish-e -T PARSED_WORDS -v 3 -i blah.html -f blah.idx
> White-space found word 'product_families=Monitors,Desktop'
> Why is Swish-e finding the "words" listed above, for example,
> 'product_families=Monitors,Desktop'? Neither '_' nor '=' is in WORDCHARS,
> so those strings should be getting broken into component words, shouldn't they?a
Try -T indexed_words instead of -T parsed_words. All that's doing is
showing "White-space" delimited words.
Swish splits the text into white-spaced "words", then it uses
WordCharacters and other tests to split up those words into what gets
indexed.
--
Bill Moseley
moseley@hank.org
Received on Thu Dec 18 06:13:51 2003