Skip to main content.
home | support | download

Back to List Archive

SWISH-E ranking algorithm - description?

From: Tito Sierra <tito_sierra(at)not-real.ncsu.edu>
Date: Fri Jan 20 2006 - 17:17:38 GMT
Hello,

I'm looking for some documentation describing the current ranking  
algorithm in use by either swish-e 2.4.X or 2.2.X.  This need not be  
a full technical description, but a description of what factors  
influence document ranking.

Is it accurate to say that SWISH-E employs a variant of "tf-idf" for  
ranking?
	http://en.wikipedia.org/wiki/Tf-idf

 From reading the mailing list archives I understand there is  
interest in improving the ranking algorithm.  For my purposes SWISH-E  
works great as is.  Very nice tool.  My immediate interest is not in  
tweaking or improving the current algorithm, but in describing it to  
others.  If someone has already gone through the trouble of writing  
this up I would love to cite them.  Apologies if I have missed  
something in the mail archives or the swish-e.org website.

Thanks,
Tito

P.S. I have read Josh Rabinowitz's very useful article "Indexing  
Arbitrary Data with SWISH-E."  I'm hoping for something more  
descriptive than this:

"The ranking algorithm used in swish-e does not bear easy  
explanation, but does take into account factors including the size of  
the documents, the frequency of each word in the document, and which  
tags the given text resides in."
Received on Fri Jan 20 09:17:38 2006