so since i cannot duplicate the error on a smaller test index of currently
3000 files, and i suspect that doing -T indexed_words against the full
1,011,839 files (woo, looky there... 1 million!) will be pretty
cumbersome, can i just write a grep to pipe it to and assume that if
swishdefault gets the words corey and rich, then the string corey rich
should also be searchable?
sorry as always for taking up so much of everyone's time.
866 476 7862 x902
On Wed, 8 Nov 2006, Bill Moseley wrote:
> On Wed, Nov 08, 2006 at 02:03:56PM -0500, brad miele wrote:
>> ok, so just to make sure i do this correctly, i will build two indexes
>> with 2.4.4, the smaller test, and the full.
>> then, i will do -T INDEX_WORDS > a file for each and look at how the
>> phrase (Corey Rich) is represented?
> Not exactly. While indexing I'd use -T indexed_words -- that would
> show the what words are being placed into the index as you are
> indexing. Then you would be able to say, yes those words are being
> parsed and added to the index.
> Then after indexing use -T index_words to make sure they actually got
> into the index and are indexed under the correct metanames, etc.
> At that point if searching doesn't work then we know it's a problem
> with how the search code is accessing the index to find the words in
>> also, something in the back of my head is screaming "stemming" at me, but
>> i am not sure why... maybe just a migraine.
> Possible. -T indexed_words and -T index_words would show you the
> stemmed words (-T parsed_words, iirc, would show you the pre-stemmed
> Then when searching use -H9 to show you what swish is searching for --
> that is, that it's searching using the stemmed words.
> You have more experience with your indexes than I do, but my general
> feeling is stemming and stop words are not always such a great thing.
> Bill Moseley
> Unsubscribe from or help with the swish-e list:
> Help with Swish-e:
Received on Wed Nov 8 12:44:08 2006