Skip to main content.
home | support | download

Back to List Archive

Re: Relevance anomaly?

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Nov 20 2003 - 22:19:19 GMT
On Thu, Nov 20, 2003 at 11:36:19AM -0800, Roy Tennant wrote:
> We're using swish-e to search a collection of books and I've discovered  
> an odd thing with relevance that I'm trying to puzzle out. The  
> following search on "freud":
> 
> http://texts.cdlib.org/cgi/searchallbooks.pl?search=freud&mode=book+text&sort=relevance
> 
> returns a list of books, with the 439th result being the book "Freud  
> and His Critics". That book is rife with the word, and yet it is ranked  
> very low. I have verified that it is not my CGI that is doing anything  
> funny, as a command line search provides the same results. Why is that  
> particular book ranked so low, when it has something on the order of  
> almost twice as many occurrences of the word "freud" in it as the  
> top-ranked book?

Hi Roy,

Can't really tell without looking at the source documents and how you
are indexing.  You can set a define RAW_RANK when compiling to prevent
swish from scaling the output, but that's probably not enough detail.
There's also a DEBUG_RANK define that can be set to dump info about the
ranking while searching.  See rank.c for details.

I know I've talked about it a lot, but ranking still needs a major
overhaul.

-- 
Bill Moseley
moseley@hank.org
Received on Thu Nov 20 22:19:25 2003