Hello,
I'm looking for some documentation describing the current ranking
algorithm in use by either swish-e 2.4.X or 2.2.X. This need not be
a full technical description, but a description of what factors
influence document ranking.
Is it accurate to say that SWISH-E employs a variant of "tf-idf" for
ranking?
http://en.wikipedia.org/wiki/Tf-idf
From reading the mailing list archives I understand there is
interest in improving the ranking algorithm. For my purposes SWISH-E
works great as is. Very nice tool. My immediate interest is not in
tweaking or improving the current algorithm, but in describing it to
others. If someone has already gone through the trouble of writing
this up I would love to cite them. Apologies if I have missed
something in the mail archives or the swish-e.org website.
Thanks,
Tito
P.S. I have read Josh Rabinowitz's very useful article "Indexing
Arbitrary Data with SWISH-E." I'm hoping for something more
descriptive than this:
"The ranking algorithm used in swish-e does not bear easy
explanation, but does take into account factors including the size of
the documents, the frequency of each word in the document, and which
tags the given text resides in."
Received on Fri Jan 20 09:17:38 2006