Nikolaus Rath <Nikolaus@rath.org> wrote:
> Hello!
>
> I want to rank words some keywords in the head of an html file better
> than all words in the body. How can i realise that? The following
> files ranked equal when searching for "pattern".
Sorry for the obsolete question. Bill Moseley explained this in his
posting: You have to specify "MetaNames keywords" and to search for
"keywords=pattern or pattern". In this case the meta matches rank
higher.
But now i discovered an other unexepected behaviour. I can't see that
<b>foo</b> and "foo" are ranked differently:
-------snip------------
nikratio:~/test$ cat test1.html
<html>
<head>
<meta name="keywords" content="foobar">
<title>A document</title>
</head>
<body>
some words blub baBasically, the rank of a single hit word is the log
base e of the number of times the word is found in the doc (it's<br>
pattern<br>
frequency). I suppose log(e) is to avoid making docs with a huge number
of words rank way higher than those with just a few.<br>
</body>
</html>
nikratio:~/test$ cat test2.html
<html>
<head>
<meta name="keywords" content="foobar">
<title>A document</title>
</head>
<body>
some words blub baBasically, the rank of a single hit word is the log
base e of the number of times the word is found in the doc (it's<br>
<b>pattern</b><br>
frequency). I suppose log(e) is to avoid making docs with a huge number
of words rank way higher than those with just a few.<br>
</body>
</html>
nikratio:~/test$ cat config
IndexDir .
MetaNames keywords
IndexOnly .html
nikratio:~/test$ swish-e -c config
Indexing Data Source: "File-System"
Indexing "."
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 40 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
40 unique words indexed.
4 properties sorted.
2 files indexed. 805 total bytes. 118 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
nikratio:~/test$ swish-search -w pattern
# SWISH format: 2.2.1
# Search words: pattern
# Number of hits: 2
# Search time: 0.000 seconds
# Run time: 0.032 seconds
1000 ./test2.html "A document" 406
1000 ./test1.html "A document" 399
.
nikratio:~/test$
---------------
Maybe someone can explain me this behaviour too?
--Nikolaus
Received on Wed Dec 18 14:06:59 2002