Skip to main content.
home | support | download

Back to List Archive

Re: improving swish-e rank system

From: <moseley(at)not-real.hank.org>
Date: Wed May 28 2003 - 06:23:37 GMT
On Tue, May 27, 2003 at 03:44:12PM -0700, Emilio Davis wrote:
> Hello, I'm a computer engeneer student and I'm currently working on my
> degree thesis. Part of the thesis is to implement a modified version of
> PageRank, that work is done and I have included that rank into swish-e
> using meta tags, now I want to mix both swish-e rank and the new pagerank
> (in a linear combination) to improve the search result, is there any clean
> way to do it (I know I can mix those ranks and sort after swish-e has
> given me the result but that option use a lot of memory).

Hello Emilio,

This has been discussed lately.  So what you want to do is have a meta
tag on documents (thus a value stored as a property) and then have that
value modify the rank of the file. Is that correct?

One problem with that method is the property table must be read for each
and every result.  It may be a small problem but it might slow down
result generation.  Reading a property requires a bit of I/O.  Just have
to try it and see.  And try it with an index where you might get 30,000
results.

search.c looks up individual words in the index, and rank.c calculates a 
rank number for each file (based on that word).

search.c also combines ranks in AND and OR operations.

After all hits have been found result_sort.c is called to sort the 
results.  Since your rank bias would modify the rank you would either 
need to add in a new step to lookup the page ranks, or add in some code 
into result_sort.c to lookup the page ranks.  Probably best in 
result_sort.c because that's already looping through the list of 
results.

You might note that result_sort.c is where the "bigrank" is found -- 
the largest rank number is found when looping through all the results.  
This number is used to create "rank_scale_factor" which is used to scale 
results from 1...1000 when printing the rank.

You can look at the code in docprop.c to see how to lookup a property 
by passing in a "result" structure.  You can also look at libtest.c for 
an example.

Again, you may find that reading a property from the property file is 
too slow.  Other options would be to create another table of just page 
rank numbers index by file number.  That would likely be a faster than 
reading the property file directly.  Swish-e uses tables like that to 
make sorting faster (swish pre-sorts properties at indexing time and 
creates integer tables that are used for sorting by properties at search 
time).

Anyway, make sure you are using 2.4.0 code or code from cvs.


-- 
Bill Moseley
moseley@hank.org
Received on Wed May 28 06:23:48 2003