Re: [swish-e] Users Digest, Vol 5, Issue 10

From: Peter Karman <peter(at)>
Date: Tue May 15 2007 - 23:58:24 GMT
Greg Ryjikh scribbled on 5/15/07 6:38 PM:
> Thanks Peter,
> Your first point explained my result. When I changed my search query 
> from -w (content=test) to -w (contentlabel=test or contentbody=test) 
> then I started to see an effect which MetaNamesRank gives. It all sounds 
> good in general but not "good enough" in our particular case. I provided 
> this simple test data just to show a problem. In reality xml files we 
> need to search have about couple hundreds of different tags and we don't 
> even know all of them in advance. We do want to search all of them but 
> give some priority to few. I was planned to use
> UndefinedMetaTags auto
> and use known top level  tag (or wrapper) "content" for searching 
> criteria but it seems that ranking is not working in that case. Is it 
> any other way to give more "priority" to some meta tags but still search 
> content in all other tags without explicitly creating huge search query 
> with all xml tag names ?

First, make sure you read:

(though ignore the typo about MetaRankBias -- that should be MetaNamesRank and 
is fixed in svn trunk.)

The thing to notice is how MetaNamesRank is used in calculating rank scores. 
Basically is just artificially inflates the frequency count for a term in a 
document. Therefore other factors, like the doc's relative length and the term's 
IDF, will also pull the score one way or the other. One thing I haven't tried 
but have considered is recompiling Swish-e with the RANK_BIAS_RANGE set to 
something much higher than the default '10' (like maybe 100 or 1000), because 
then setting it to 50 will make that feature weigh more heavily in the algorithm.

Second, it sounds like there are at least 2 things you are asking:

1. if you are wanting to give a priority boost to a few MetaNames that you know 
in advance, you can still use "UndefinedMetaTags auto". That should work fine.

2. If you want to make searches look in 'content' by default, add an alias for 
that to the 'swishdefault' MetaName. See

for example.

You don't want to bias 'content' with MetaNamesRank at all; that will have zero 
net effect, since all words will be indexed under that MetaName.

