Skip to main content.
home | support | download

Back to List Archive

Re: ranking ideas

From: Peter Karman <karman(at)not-real.cray.com>
Date: Mon Apr 19 2004 - 12:17:41 GMT
Peter Karman wrote on 4/16/04 4:38 PM:

> 2. RelativeFrequencyBias *percent* *bias* *max*
> 
> I know, this seems like newer new math (and my math was never 
> outstanding; I'm a literary critic by training...). But consider this 
> example and please tell me where my logic is wrong:

I must have been high when I wrote this last Friday. My example was just 
totally wrong.

What I want to do in ranking is account for the relative frequency of 
each query word in the total found set. Then apply something like audio 
compression (not like mp3, but more like analog compression in 
recording), where all the softer sounds are brought up to a threshold 
and all the louder sounds are tapered off at a max, thereby reducing the 
sonic range to within a min and max.

Example:

a search for 'the foo' turns up 100 hits. 'the' appears a total of 1000 
times in those 100 hits. 'foo' appears a total of '150' times. My 
assumption is that 'foo' is a more important word than 'the', based on 
those numbers.

If we use this formula:

f_bias = max_freq / freq

then the f_bias for 'foo' would be:

6.67 = 1000 / 150

In rank.c right now, each word's raw rank per doc is calculated based on 
structure (context where it appears) and any MetaNamesRank value.

rank += sw->structure_map[ GET_STRUCTURE(posdata[i]) ] + meta_bias;

What I'm proposing is this:

rank += ( sw->structure_map[ GET_STRUCTURE(posdata[i]) ] + meta_bias )
  	* f_bias

In my example, if a doc had 'foo' 10 times in a structure worth 9 
points, for a normal rank of 90, it's rank would jump by 600+ points (10 
* 9 * 6.67). This makes sense to be, because in our example, this 
particular doc has 15% of the total occurances of 'foo' in it, making it 
a pretty 'relevant' doc.

This lets docs with less common words rise faster in the rankings than 
docs with equal instances of more common words.

What do you think?

pek

-- 
Peter Karman - Software Publications Engineer - Cray Inc
phone: 651-605-9009 - mailto:karman@cray.com
Received on Mon Apr 19 05:17:43 2004