I've found a weird "feature" of swish-e 2.4.5.
If I make a query with many words or-ed together, I can get strange ranks from swish-e. Here's an example using sman (from http://search.cpan.org/~joshr/Sman-1.01/)
The -rank option shows the rank as the first column, which you can see has ranks like "24636318".
% sman -rank a or b or c or the or bob or element or xml or html or perl or python or plastic or cheese or c or atoi or print or slice or ping
24636318 perltoc (1) perl documentation table of contents
5436486 perl58delta (1) what is new for perl v5.8.0
2384158 CGI (3) Simple Common Gateway Interface Class
2314418 perl571delta (1) what's new for perl v5.7.1
1627450 perlop (1) Perl operators and precedence
1538834 perl561delta (1) what's new for perl v5.6.x
1510578 perlfaq4 (1) Data Manipulation ($Revision: 1.56 $, $Date: 2004/11/03
1494928 perldiag (1) various Perl diagnostics
1423516 perl5004delta (1) what's new for perl5.004
1418476 perlpodspec (1) Plain Old Documentation: format specification and notes
1371134 perlfaq7 (1) General Perl Language Issues ($Revision: 1.18 $, $Date:
1360568 Config (3) access Perl configuration information
1301706 perl581delta (1) what is new for perl v5.8.1
1244514 perlfaq (1) frequently asked questions about Perl ($Date: 2004/10/05
1187740 perlfaq5 (1) Files and Formats ($Revision: 1.31 $, $Date: 2004/02/07
1069226 MIME::Tools (3) modules for parsing (and creating!) MIME entities
1036636 Pod::Parser (3) base class for creating POD filters and translators
870262 perlfunc (1) Perl builtin functions
834496 perlretut (1) Perl regular expressions tutorial
780922 perlxstut (1) Tutorial for writing XSUBs
I tested with the swish-e binary on the index directly and I got the same ranks.
In my latest program (in development), I've seen this cause the ranks to sometimes show up in the range 0-1000 but be sorted incorrectly (IE, with one of the hits ranked 1000 showing below hits ranked with much lower ranks)
Thought someone might want to know :) Developers, let me know if you need a repeatable test case or more info. Perhaps the rank should/could be computed using doubles or 'long longs' or arbitrary precision numbers?
Also, it would be cool if we could retreive un-normalized weights from swish-e searches.
Sucker-punch spam with award-winning protection.
Try the free Yahoo! Mail Beta.
Users mailing list
Received on Mon Jun 11 12:51:13 2007