Skip to main content.
home | support | download

Back to List Archive

Re: improving swish-e rank system

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed May 28 2003 - 14:09:51 GMT
On Wed, May 28, 2003 at 01:00:53AM -0700, Emilio Davis wrote:
> 
> thx i'll try what you suggested, I think I'll cut the results to lets say
> 200-300 (after the first sorting is done) then add my rank and re-sort
> (i'm thinking in a linear combination where the strong part is swish-e
> rank and my rank will only modify a bit the rank, so with 200ish pages I
> think I won't miss a noticiable amount of 'good' results).

It would also be very helpful to have a look through the rank.c code.  
It needs some work.  

It would be great to find a way to limit the effect of vastly different
sized documents (rank is mostly based on the word frequency which does
not work when file sizes are very different.  An example can be seen at
http://search.apache.org -- search for "install" and you end up with
mailing list archives and the large CHANGES file at the top (type in
"installation" and it works better). 

Fuzzy searching doesn't really help.  Try "install" with "sound-alike" 
at http://search.apache.org/index.cgi?full=1.

There's code in rank.c to take into consideration total words in the 
file (IgnoreTotalWordCountWhenRanking), but I have not found it that 
much better for ranking.


-- 
Bill Moseley
moseley@hank.org
Received on Wed May 28 14:09:59 2003