Skip to main content.
home | support | download

Back to List Archive

Re: Field Weighing for xml docs

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Jul 22 2004 - 14:32:20 GMT
On Thu, Jul 22, 2004 at 06:55:14AM -0700, Tac wrote:
> I remember reading somewhere that swish internally weighed swishtitle
> heavier than other fields, and h1 tags higher in html documents.  Is there
> documentation on how to control this for XML files?

Kind of -- but it's a different system.

The weighting you describe above works by checking the "structure"
bits recorded for each word indexed.  Swish stores a bit of data for
each and every word.  It stores the word's position (for phrase
matching) and the structure byte flags where in an HTML document the
word was found (i.e. in <title>, or H1 or <body>).  That data is only
for HTML documents (which is what swish-e was first designed to
index).

There's also a config option called MetaNamesRank and that's suppose
to allow adjustment of the rank based on metaname.  That's listed in

   http://swish-e.org/current/docs/SWISH-CONFIG.html

the docs say it isn't implemented, although it is in cvs (I thought
also in the last version of swish -- you could look at rank.c in your
own version).  How well it works is up for debate.  There's been a lot
of talk about improvements to the ranking code.

-- 
Bill Moseley
moseley@hank.org
Received on Thu Jul 22 07:32:36 2004