Skip to main content.
home | support | download

Back to List Archive

Re: Weighting meta-names

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Dec 04 2002 - 04:29:16 GMT
At 06:20 PM 12/03/02 -0800, Holden Glova wrote:
>I noticed in the docs a 'MetaNamesRank' config option which suggests to
apply 
>weightings to the listed meta-names for ranking purposes. Our current client 
>would like to be able to weight certain meta-names higher then other 
>meta-names, so I thought this would be a perfect candidate to achieve
this. I 
>also saw that it says "to be implemented".

Yes, to be implemented.

>The CONFIG document on the website is dated Sept 9th 2002 and there has
since 
>been a new release so I thought the docs might not have been updated with
the 
>release.

Nope, not this time.  The new release didn't change any config settings so
that doc didn't need updating.

>I then modified my swish-e config and added some MetaNamesRank, to 
>my surprise swish-e did not complain about the config option.

Right the code for the parsing the configuration was done first.

>To my further 
>surprise, swish-e altered the order of the search results!

Na, no way.  It doesn't do anything.  grep "rank_bias" in the source -- all
that's implemented in 2.2.2 is parsing from the config file and saving it
in the index.  It doesn't alter ranking.

>So, my questions; does the 'MetaNamesRank' directive actually apply
weightings 
>as the CONFIG doc might suggest? If not, what does the 'MetaNamesRank' 
>directive do?

Mostly it generates questions on the list.  I imagined it would be
implemented sooner.

>Is there any way to apply weightings to specified meta-names?

Well, yes.  Yesterday I added that feature to the ranking code.  If you get
a daily snapshot or check out from cvs you can give it a try.  I didn't
spend much time on it, and I plan to refine it more.

The documentation says you can enter a number from -10 to +10.  I'm not
sure that will stay that way.  A setting of 10 would mean a single word
would have about the same rank as ten words in a meta that didn't have any
bias.

Basically, the rank bias number is added to the value of the word at a
given position in the document (meaning if it's it a <title> or <h1> vs.
just in the <body>).  Those HTML values are set in config.h.  <h1>, for
example, is set in config.h as 3.  

Normally HTML would not mix meta names and HTML tags, but with the libxml2
parser you can make up fake html tags as metanames.

So, with <foo><h1>word</h1></foo> if foo was set with a MetaNameRank of 5
"word" would have a value of 3 + 5 + 1.  Meaning it's like 9 unbiased words. 
The rank for the document is the log of the sum of all the word values in
the document.  I think a word's rank should also be dependent on how many
documents contain that word, but that's not done at this time.

I seem to no be able to give simple yes or no answers.

So the simple answer is yes you can use 2.3.x dev version and try it out.


-- 
Bill Moseley
mailto:moseley@hank.org
Received on Wed Dec 4 04:29:40 2002