Skip to main content.
home | support | download

Back to List Archive

[swish-e] Ranking errors when or-ing many search terms

From: J Robinson <jrobinson852(at)not-real.yahoo.com>
Date: Mon Jun 11 2007 - 15:32:39 GMT
Hello, All:

I've found a weird "feature" of swish-e 2.4.5. 

If I make a query with many words or-ed together, I can get strange ranks from swish-e. Here's an example using sman (from http://search.cpan.org/~joshr/Sman-1.01/)

The -rank option shows the rank as the first column, which you can see has ranks like "24636318".

% sman -rank a or b or c or the or bob or element or xml or html or perl or python or plastic or cheese or c or atoi or print or slice or ping 
24636318 perltoc         (1) perl documentation table of contents                                                                         
5436486 perl58delta     (1) what is new for perl v5.8.0                                                                                   
2384158 CGI             (3) Simple Common Gateway Interface Class                                                                         
2314418 perl571delta    (1) what's new for perl v5.7.1                                                                                    
1627450 perlop          (1) Perl operators and precedence                                                                                 
1538834 perl561delta    (1) what's new for perl v5.6.x                                                                                    
1510578 perlfaq4        (1) Data Manipulation ($Revision: 1.56 $, $Date: 2004/11/03                                                       
1494928 perldiag        (1) various Perl diagnostics                                                                                      
1423516 perl5004delta   (1) what's new for perl5.004                                                                                      
1418476 perlpodspec     (1) Plain Old Documentation: format specification and notes                                                       
1371134 perlfaq7        (1) General Perl Language Issues ($Revision: 1.18 $, $Date:                                                       
1360568 Config          (3) access Perl configuration information                                                                         
1301706 perl581delta    (1) what is new for perl v5.8.1                                                                                   
1244514 perlfaq         (1) frequently asked questions about Perl ($Date: 2004/10/05                                                      
1187740 perlfaq5        (1) Files and Formats ($Revision: 1.31 $, $Date: 2004/02/07                                                       
1069226 MIME::Tools     (3) modules for parsing (and creating!) MIME entities                                                             
1036636 Pod::Parser     (3) base class for creating POD filters and translators                                                           
870262 perlfunc        (1) Perl builtin functions                                                                                         
834496 perlretut       (1) Perl regular expressions tutorial                                                                              
780922 perlxstut       (1) Tutorial for writing XSUBs   

I tested with the swish-e binary on the index directly and I got the same ranks. 

In my latest program (in development), I've seen this cause the ranks to sometimes show  up in the range 0-1000 but be sorted incorrectly (IE, with one of the hits ranked 1000 showing below hits ranked with much lower ranks)

Thought someone might want to know :)  Developers, let me know if you need a repeatable test case or more info.  Perhaps the rank should/could be computed using doubles or  'long longs' or arbitrary precision numbers?

Also, it would be cool if we could retreive un-normalized weights from swish-e searches.

Best, 
jrobinson






 
____________________________________________________________________________________
Sucker-punch spam with award-winning protection. 
Try the free Yahoo! Mail Beta.
http://advision.webevents.yahoo.com/mailbeta/features_spam.html
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Mon Jun 11 12:51:13 2007