Skip to main content.
home | support | download

Back to List Archive

Indexing small words/limited number of stopwords with Swish 1.1.

From: Bert Hiddink <hiddink(at)not-real.sipromicro.com>
Date: Tue Jul 07 1998 - 09:58:29 GMT
Hi,

I am using swish 1.1.1 on a virtual FreeBSD-server. Their version 
does not allow me install swish-e so in the meantime I work with 
swish 1.1.1.

However, I came across two problems when indexing my site:

1) My files are in spanish and therefore contain several names like 
'El Salvador", 'La Paz', etc. which are relevant search words for my 
database. So, when indexing, I commented out in the configuration 
file the options "Ignore Words" and "Ignore Limits" in order not to 
'loose' this information. However, when I do a search for "El 
Salvador",  Swish gives me an 'no results'-error message, although 
there are hundreds of files containing "El Salvador".

When analizing the indexfile, I concluded that 'El' was not indexed, 
only 'Salvador'. This is very annoying because it would be similar to 
index only 'York' and 'Angeles' and not 'New York' and 'Los Angeles'.

How to resolve this? Why does Swish skip 'my important small words' 
when indexing, althought the options "Ignore Words" and "Ignore 
Limits" are commented out?

2) Alternatively, I decided to include 'El' and 'La' in the stopword 
list and to index again. Now when looking for 'El Salvador', at 
least Swish gives an error message 'one or more words too common' so 
that at least the searcher knows that he or she should no put those 
small words and should try again with 'Salvador' or 'Paz'.

However, it seems that Swish only accepts the first 300 words in the 
list. The other words are treated as under 1). Is there a limit to 
the number of stopwords? 

Please help me out!

Thanks a lot!

Bert Hiddink
hiddink@sipromicro.com
****************************

Bert Hiddink
FUNDACION GALILEO
Correo electronico: hiddink@sipromicro.com
Sitio: www.sipromicro.com
Tel. +506 280 8683
Telefax. +506 280 8886

****************************
Received on Tue Jul 7 09:07:53 1998