Skip to main content.
home | support | download

Back to List Archive

Re: [SWISH-E:440] Ranking algorithm is flawed when more than two search terms...

From: Frank Heasley <DrHeasley(at)not-real.chemistry.com>
Date: Wed Aug 12 1998 - 16:44:25 GMT
At 09:26 AM 8/12/98 -0700, Mark Gaulin wrote:
>Hi
>Ranking has got to be a bit of a black art, so I am not about to criticize
the
>exact ranking weights, etc, but there does seem to be a basic flaw in the
way 
>swish-e 1.1 combines the ranks when doing searches of three or more search
>words.
>
>Below is the "proof", but I'll start with the conclusion:
>When searching for three words (using AND), the final rank of a file will be 
>rank(word1) * 25% + rank(word2) * 25% + rank(word3) * 50%
>and this is counter intuitive.

(cut)

Hi Mark,

You're right, this doesn't make much sense.  But if we're going to go to
the trouble, the ranking algorithm(s) should go a bit deeper than even
weighting.

For example, if a search term occurs more frequently, or earlier, than in
other documents, the document should be ranked higher.

There are lots of other considerations.

Frank
Received on Wed Aug 12 10:28:48 1998