On Tue, May 27, 2003 at 03:44:12PM -0700, Emilio Davis wrote:
> Hello, I'm a computer engeneer student and I'm currently working on my
> degree thesis. Part of the thesis is to implement a modified version of
> PageRank, that work is done and I have included that rank into swish-e
> using meta tags, now I want to mix both swish-e rank and the new pagerank
> (in a linear combination) to improve the search result, is there any clean
> way to do it (I know I can mix those ranks and sort after swish-e has
> given me the result but that option use a lot of memory).
This has been discussed lately. So what you want to do is have a meta
tag on documents (thus a value stored as a property) and then have that
value modify the rank of the file. Is that correct?
One problem with that method is the property table must be read for each
and every result. It may be a small problem but it might slow down
result generation. Reading a property requires a bit of I/O. Just have
to try it and see. And try it with an index where you might get 30,000
search.c looks up individual words in the index, and rank.c calculates a
rank number for each file (based on that word).
search.c also combines ranks in AND and OR operations.
After all hits have been found result_sort.c is called to sort the
results. Since your rank bias would modify the rank you would either
need to add in a new step to lookup the page ranks, or add in some code
into result_sort.c to lookup the page ranks. Probably best in
result_sort.c because that's already looping through the list of
You might note that result_sort.c is where the "bigrank" is found --
the largest rank number is found when looping through all the results.
This number is used to create "rank_scale_factor" which is used to scale
results from 1...1000 when printing the rank.
You can look at the code in docprop.c to see how to lookup a property
by passing in a "result" structure. You can also look at libtest.c for
Again, you may find that reading a property from the property file is
too slow. Other options would be to create another table of just page
rank numbers index by file number. That would likely be a faster than
reading the property file directly. Swish-e uses tables like that to
make sorting faster (swish pre-sorts properties at indexing time and
creates integer tables that are used for sorting by properties at search
Anyway, make sure you are using 2.4.0 code or code from cvs.
Received on Wed May 28 06:23:48 2003