If I'm not mistaken from looking at the SWISH-E source code, it
seems as though the total number of words in a file (used in
ranking calculations) is the total number of INDEXED words and
not the total number of ACTUAL words in a file.
If you use the number of indexed words, I think that would
yield (erroneously?) higher ranks for words in a given document
that if the number of actual words were used instead.
1. Is this correct?
2. If so, can a justification be given as to why the number of
indexed words should be used as opposed to the number of
actual words?
- Paul
Received on Fri May 14 17:15:10 1999