Skip to main content.
home | support | download

Back to List Archive

RE: Relevance ranking

From: David Norris <kg9ae(at)not-real.geocities.com>
Date: Fri Aug 13 1999 - 21:51:44 GMT
> I'm looking for information on how swish-e 1.3.x computes the relevance
> ranking scores.

The source code is the best documentation going...  I don't know all of the
answers right off.  So, consider this a quick response with the hopes it
will yield some piece of useful information.

I don't know the ranking relevancy of the following, if any.  It should be
fairly clearly defined if you can find it in the source, I've had my head in
that portion once or twice.  Header generally means H1, H2, H3, etc.  Head
means HEAD.  Emphasis means I, B, STRONG, EM, etc.  It follows HTML context.
I don't believe that meta element contents are treated specially unless
specified in the indexer configuration.  The source and/or someone else on
the list should be able to clarify further.  It might help to look through
the search help which explains how various search features work:
http://sunsite.berkeley.edu/SWISH-E/Manual/searchhelp.html
Specifically the #context and #meta sections.

Function getrank() on line 631 of index.c should give you some insight into
the ranking.  You would probably want to look at related code for further
information.  The ranking is calculated based on data gathered from several
places.  I believe that the integer "emphasized" contains the only values
related to HTML.  Also, I believe that tfreq, freq, and words are based on
only the text.  I'm not sure without looking more closely what exactly is
considered text, it may include meta contents (<meta content="...">) as well
as the normal text.

,David Norris

World Wide Web - http://www.geocities.com/CapeCanaveral/Lab/1652/
Home Computer - http://illusionary.tzo.cc/
Page via mail - 412039@pager.mirabilis.com
ICQ Universal Internet Number - 412039
E-Mail - kg9ae@geocities.com
Received on Fri Aug 13 14:43:55 1999