Skip to main content.
home | support | download

Back to List Archive

Rank values. How are they generated?

From: William Bailey <wb(at)not-real.pro-net.co.uk>
Date: Thu Sep 04 2003 - 09:15:40 GMT
Hi All,

	I am currently the developer for a site that uses swish for searching its 
catalogue and have just been asked by the client the following question:

"For the FAQ we just need some general search score info rather than anything 
specific."

	Now apart from saying "The most relevant should have a higher score." i don't 
exactly know what to say.

	Now the data that is being searched is both large and has a lot of meta 
fields defined so how will this affect the score? If required i can post 
sample data as well as config files.

	I know the use is probably not what swish was designed for but it does the 
job well although the only feature I'm missing is to search for a range of 
values. I know it can be done with the -L but that only applies to properties 
and therefore 1 value per file which is not enough for my requirements as i 
would like to order the results by a field that could occur more then once 
i.e. release dates. Anyway before i get even more off topic :)

	For reference here is a typical query along with swish output...

User searches for:
	* artist: "Black Sabbath"
	* include compilation recordings in artist search.
	* track: iron man
	* format: CD
	* order: Search relevance (highest  lowest)


The following command get run:

/usr/local/bin/swish-e -H 9 -d\\t -w '(  (  recording.artist.main=( black 
sabbath )  OR  recording.track.artist.main=( black sabbath )  OR  
recording.artist.main.md5=(b1dd10efa6a2761536d12edc20edeca9)  OR  
recording.track.artist.main.md5=(b1dd10efa6a2761536d12edc20edeca9)  )  AND  
recording.track.title=(iron man)  AND  recording.media.available.group=( -cd- 
)  AND  recording.available=( yes )  AND  recording.chanel=(musicmaster)  )'  
-s swishrank desc recording.title asc recording.artist.main asc -b 0 -m 3000 
-f /usr/home/wb/Web/Work/red-phase3/_server/data/swish/data.index




And i get the following output...

# SWISH format: 2.2.2
# Search words: (  (  recording.artist.main=( black sabbath )  OR  
recording.track.artist.main=( black sabbath )  OR  
recording.artist.main.md5=(b1dd10efa6a2761536d12edc20edeca9)  OR  
recording.track.artist.main.md5=(b1dd10efa6a2761536d12edc20edeca9)  )  AND  
recording.track.title=(iron man)  AND  recording.media.available.group=( -cd- 
)  AND  recording.available=( yes )  AND  recording.chanel=(musicmaster)  )
#
# Index File: /usr/home/wb/Web/Work/red-phase3/_server/data/swish/data.index
# Swish-e format: 2.2.2
#
# Name: searchRED Data File
# Saved as: data.index
# Counts: 3260219 words, 560604 files
# Indexed on: 2003-08-19 17:32:10 BST
# Description: This is an index of the searchRED data.
# Pointer: http://www.searchred.co.uk/
# Maintained by: William Bailey
# DocumentProperties: Enabled
# Stemming Applied: 0
# Soundex Applied: 0
# Fuzzy Indexing Mode: None
# IgnoreTotalWordCountWhenRanking: 1
# WordCharacters: 
#&'-/0123456789;abcdefghijklmnopqrstuvwxyz
# MinWordLimit: 1
# MaxWordLimit: 40
# BeginCharacters: 
&'(-0123456789;abcdefghijklmnopqrstuvwxyz
# EndCharacters: 
'),-.0123456789;abcdefghijklmnopqrstuvwxyz
# IgnoreFirstChar:
# IgnoreLastChar:
# StopWords:
# BuzzWords:
# Search Words: (  (  recording.artist.main=( black sabbath )  OR  
recording.track.artist.main=( black sabbath )  OR  
recording.artist.main.md5=(b1dd10efa6a2761536d12edc20edeca9)  OR  
recording.track.artist.main.md5=(b1dd10efa6a2761536d12edc20edeca9)  )  AND  
recording.track.title=(iron man)  AND  recording.media.available.group=( -cd- 
)  AND  recording.available=( yes )  AND  recording.chanel=(musicmaster)  )
# Parsed Words: ( ( recording.artist.main = ( black sabbath ) or 
recording.track.artist.main = ( black sabbath ) or recording.artist.main.md5 
= ( b1dd10efa6a2761536d12edc20edeca9 ) or recording.track.artist.main.md5 = ( 
b1dd10efa6a2761536d12edc20edeca9 ) ) and recording.track.title = ( iron man ) 
and recording.media.available.group = ( -cd- ) and recording.available = ( 
yes ) and recording.chanel = ( musicmaster ) )
#
# Number of hits: 14
# Search time: 0.322 seconds
# Run time: 0.336 seconds
1000    MM/000/423/195.xml      423195  44209
988     MM/000/267/295.xml      267295  31681
980     MM/000/012/875.xml      12875   26547
972     MM/000/374/094.xml      374094  21523
954     MM/000/326/899.xml      326899  12316
953     MM/000/012/853.xml      12853   14668
949     MM/000/012/890.xml      12890   20204
944     MM/000/012/867.xml      12867   15696
939     MM/000/385/532.xml      385532  8886
928     MM/000/012/876.xml      12876   21115
264     MM/000/221/749.xml      221749  14080
264     MM/000/302/828.xml      302828  11119
264     MM/000/219/742.xml      219742  11725
264     MM/000/374/832.xml      374832  8322
.

Thanks for any insight that anybody can provide.

-- 
Regards,
	William Bailey.
	Pro-Net Internet Services Ltd.
	http://www.pro-net.co.uk/
Received on Thu Sep 4 09:16:56 2003