Skip to main content.
home | support | download

Back to List Archive

Re: Tuning ranking manually with MetaNamesRank

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Thu Jun 09 2005 - 12:41:09 GMT
koszalekopalek scribbled on 6/9/05 4:08 AM:

> I guess this the define in swish.h:
> 
>     #define RANK_BIAS_RANGE 10 /* max/min range ( -10 -> 10, with zero 
> being no bias ) */
> 

correct.


> To 'spam' the tags I 'multiplied' the strings by x99. I assume this has 
> an effect similar to setting the bias, right?
> 
> tuningcfg = (
> 'foo ' x 99,  'http://localhost/a.htm',
> 'bar ' x 99,  'http://localhost/b.htm',
> );


yes, I think so.


> Anyway -- I think want I am doing is becoming a hack on top of a hack. 
> Let's change it into a feature request:-)
> 
> The whole point is that I think it is useful to be able to manually 
> assign urls to selected keywords. (Remember that Google demo I mentioned 
> in my first post?)  The keyword/url pairs could be read from a plain 
> text file. The location of that file could be specified in the 
> configuration hash for spider.pl. This is easy. Now, once I index an URL 
> and I know that some 'keywords' are assigned to it, how do I tweak the 
> ranking? I thought that automatically inserted meta tags were a good 
> idea but maybe there is a better way?
> 

your method of assigning keywords to urls seems fine. Swish-e is an indexer, not 
a search engine. So putting the feature you're describing directly into the 
Swish-e code seems "out of range" for the Swish-e's intent.

Are you including your 'keyword' metaname in the search?

here's a test I just did. Notice how when I don't specify the biased metaname 
explicitly in the query, swish-e only searches swishdefault metaname by default. 
I explicitly use swishdefault= here for demonstration.

Both files have the words 'foo' and 'bar' each 3 times. But each has it swapped 
as to where the words are located: either in mymeta or in the body 
(swishdefault). I index and search two times: once with a metanamerank bias and 
once without (once with a config file and once with no config). Notice how with 
the bias on, the difference in rank scores is significant; with no bias, the 
rank is identical (frequency is equal, metaname is equal).

karpet@cartermac 45% swish-e -w foo
# SWISH format: 2.5.4
# Search words: foo
# Removed stopwords:
# Number of hits: 1
# Search time: 0.005 seconds
# Run time: 0.033 seconds
1000 file2.html "page one" 126
.
 
                              karpet@cartermac 46% swish-e -w swishdefault=foo 
or mymeta=foo
# SWISH format: 2.5.4
# Search words: swishdefault=foo or mymeta=foo
# Removed stopwords:
# Number of hits: 2
# Search time: 0.006 seconds
# Run time: 0.029 seconds
1000 file1.html "page one" 126
367 file2.html "page one" 126
.
 
                              karpet@cartermac 47% swish-e -w swishdefault=bar 
or mymeta=bar
# SWISH format: 2.5.4
# Search words: swishdefault=bar or mymeta=bar
# Removed stopwords:
# Number of hits: 2
# Search time: 0.005 seconds
# Run time: 0.032 seconds
1000 file2.html "page one" 126
367 file1.html "page one" 126
.
 
                              karpet@cartermac 48% cat file1.html
<html>
<head>
<meta name="mymeta" content="foo foo foo" />
<title>page one</title>
</head>
<body>
bar bar bar
</body>
</html>
 
                              karpet@cartermac 49% cat file2.html
<html>
<head>
<meta name="mymeta" content="bar bar bar" />
<title>page one</title>
</head>
<body>
foo foo foo
</body>
</html>
 
                              karpet@cartermac 50% cat c
MetaNamesRank 10 mymeta

karpet@cartermac 51% swish-e -i file*.html
..
Indexing done!
 
                              karpet@cartermac 52% swish-e -w swishdefault=bar 
or mymeta=bar
# SWISH format: 2.5.4
# Search words: swishdefault=bar or mymeta=bar
# Removed stopwords:
err: Unknown metaname: 'mymeta'
.
 
                              karpet@cartermac 53% swish-e -w swishdefault=bar
# SWISH format: 2.5.4
# Search words: swishdefault=bar
# Removed stopwords:
# Number of hits: 2
# Search time: 0.004 seconds
# Run time: 0.035 seconds
1000 file2.html "page one" 126
1000 file1.html "page one" 126
.



-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
Received on Thu Jun 9 05:41:10 2005