Skip to main content.
home | support | download

Back to List Archive

Re: Ranking, even with strong bias

From: Walter Lewis <lewisw(at)>
Date: Fri Feb 04 2005 - 14:09:38 GMT
Thomas R. Bruce wrote:
 > This does help, but not enough for some applications.  A real problem 
with relevance-ranked searches of collections of judicial opinions is 
that it's hard to force title weight high enough to overcome large 
numbers of term-occurrences in the body text -- which is exactly what 
you get with important legal cases, because really important rulings are 
heavily cited.  So the cases that repeatedly cite (eg.) Brown v. Board 
of Education inevitably rank higher, all the more maddening because the 
more important the case being sought by the user the more likely it is 
to be swamped by cases citing it.   I guess other literatures manage to 
avoid this because citations don't give the title of the cited document 
in full as they do in judicial opinions.
 > Anyway, our cheap kludge for dealing with this is [snip]

Here's an alternate cheap kludge that may (or may not) add value.

I transform XML into pseudo-HTML code before passing to the indexer. 
The <title> is HTML seems to be valued more highly in the default 
ranking schemes than elements from xml schemas.  Although <title> is 
probably an exception to that.  I leave that to those who have parsed / 
written the code

More to the point, because the only time this particular document is 
going to be "read" is by the indexer, in the course of the 
transformation I can double, triple or 50x (heck, it's only a loop) the 
number of times an particular string like title is presented to the 
indexer.  So feed the indexer <title> fifty times and see if that 
doesn't shift its ranking. I do that in the HTML <body> element so that 
it doesn't appear extra times in the swish output, and I also control 
what goes into description so that it doesn't pop up multiple times 
there as well.

I deeply respect those who are going about this the *right* way and look 
forward to the results of their work. Until then, this "sort" of gets 
the job done.

Walter Lewis
Received on Fri Feb 4 06:09:40 2005