Skip to main content.
home | support | download

Back to List Archive

Re: Custom ranking...

From: Nikolaus Rath <Nikolaus(at)not-real.rath.org>
Date: Wed Dec 18 2002 - 14:06:38 GMT
Nikolaus Rath <Nikolaus@rath.org> wrote:
> Hello!
> 
> I want to rank words some keywords in the head of an html file better
> than all words in the body. How can i realise that? The following
> files ranked equal when searching for "pattern".

Sorry for the obsolete question. Bill Moseley explained this in his
posting: You have to specify "MetaNames keywords" and to search for
"keywords=pattern or pattern". In this case the meta matches rank
higher.

But now i discovered an other unexepected behaviour. I can't see that
<b>foo</b> and "foo" are ranked differently:

-------snip------------
nikratio:~/test$ cat test1.html 
<html>
<head>
    <meta name="keywords" content="foobar">
    <title>A document</title>
</head>
<body>
some words blub baBasically, the rank of a single hit word is the log
base e of the number of times the word is found in the doc (it's<br>
pattern<br>
frequency). I suppose log(e) is to avoid making docs with a huge number
of words rank way higher than those with just a few.<br>
</body>
</html>
nikratio:~/test$ cat test2.html
<html>
<head>
    <meta name="keywords" content="foobar">
    <title>A document</title>
</head>
<body>
some words blub baBasically, the rank of a single hit word is the log
base e of the number of times the word is found in the doc (it's<br>
<b>pattern</b><br>
frequency). I suppose log(e) is to avoid making docs with a huge number
of words rank way higher than those with just a few.<br>
</body>
</html>
nikratio:~/test$ cat config 
IndexDir .
MetaNames keywords
IndexOnly .html

nikratio:~/test$ swish-e -c config 
Indexing Data Source: "File-System"
Indexing "."
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 40 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: Complete
  Writing word hash: Complete
  Writing word data: Complete
40 unique words indexed.
4 properties sorted.                                              
2 files indexed.  805 total bytes.  118 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
nikratio:~/test$ swish-search -w pattern
# SWISH format: 2.2.1
# Search words: pattern
# Number of hits: 2
# Search time: 0.000 seconds
# Run time: 0.032 seconds
1000 ./test2.html "A document" 406
1000 ./test1.html "A document" 399
.
nikratio:~/test$ 
---------------

Maybe someone can explain me this behaviour too?

   --Nikolaus
Received on Wed Dec 18 14:06:59 2002