Skip to main content.
home | support | download

Back to List Archive

Re: [SWISH-E:349] Indexing small words/limited number of stopwords with Swish 1.1.

From: Roy Tennant <rtennant(at)not-real.library.berkeley.edu>
Date: Wed Jul 08 1998 - 16:49:23 GMT
In the config.h file, check this line:

#define MINWORDLIMIT

if the value after "MINWORDLIMIT" is 3 or greater, then no two-letter
word, including "el" will be indexed. If you change it, you need to
recompile SWISH.
Roy Tennant

On Tue, 7 Jul 1998, Bert Hiddink wrote:

> Hi,
> 
> I am using swish 1.1.1 on a virtual FreeBSD-server. Their version 
> does not allow me install swish-e so in the meantime I work with 
> swish 1.1.1.
> 
> However, I came across two problems when indexing my site:
> 
> 1) My files are in spanish and therefore contain several names like 
> 'El Salvador", 'La Paz', etc. which are relevant search words for my 
> database. So, when indexing, I commented out in the configuration 
> file the options "Ignore Words" and "Ignore Limits" in order not to 
> 'loose' this information. However, when I do a search for "El 
> Salvador",  Swish gives me an 'no results'-error message, although 
> there are hundreds of files containing "El Salvador".
> 
> When analizing the indexfile, I concluded that 'El' was not indexed, 
> only 'Salvador'. This is very annoying because it would be similar to 
> index only 'York' and 'Angeles' and not 'New York' and 'Los Angeles'.
> 
> How to resolve this? Why does Swish skip 'my important small words' 
> when indexing, althought the options "Ignore Words" and "Ignore 
> Limits" are commented out?
> 
> 2) Alternatively, I decided to include 'El' and 'La' in the stopword 
> list and to index again. Now when looking for 'El Salvador', at 
> least Swish gives an error message 'one or more words too common' so 
> that at least the searcher knows that he or she should no put those 
> small words and should try again with 'Salvador' or 'Paz'.
> 
> However, it seems that Swish only accepts the first 300 words in the 
> list. The other words are treated as under 1). Is there a limit to 
> the number of stopwords? 
> 
> Please help me out!
> 
> Thanks a lot!
> 
> Bert Hiddink
> hiddink@sipromicro.com
> ****************************
> 
> Bert Hiddink
> FUNDACION GALILEO
> Correo electronico: hiddink@sipromicro.com
> Sitio: www.sipromicro.com
> Tel. +506 280 8683
> Telefax. +506 280 8886
> 
> ****************************
> 
Received on Wed Jul 8 09:56:35 1998