SWISH-E index file larger than expected

From: Gordon K Smyth <gks(at)>
Date: Thu Apr 02 1998 - 01:51:53 GMT
Dear All,

Thanks for providing SWISH-E and such a well documented site.  SWISH-E is
the only freely available search engine I know of which can restrict
searches to specified html tags and meta tags - you might make more of this
in your feature list.

I've tried out SWISH-E by downloading the Sun Sparc executable for SWISH-E
1.1, and I have a question about the size of the index file.  My site has
about 900 html and txt files (mainly html), containing a total of just over
5MB of text.  The list of SWISH-E features at

tells me that the index file should be at most 5% of the size of the
original files, but the index file is actually over 1.25MB, in other words
25% of the size of the original files.

Has anyone else had this experience?  Is there a common mistake I might
have made.  I append a copy of my configuration file.

Gordon Smyth
--------------------- config file --------------------------
IndexDir /home/gks/www
IndexFile /www/httpd/cgi-bin/gks/swish-e/index
MetaNames description keywords
IndexName "WebGuide"
IndexDescription "Guide to the Web for Statisticians." 
IndexPointer ""
IndexAdmin "Gordon Smyth, ("
IndexOnly .html .htm .txt
IndexReport 2
FollowSymLinks no
NoContents .gif .xbm .au .mov .mpg .pdf .ps .jpg .jpeg
ReplaceRules replace "/home/gks/www" "../../~gks"
FileRules pathname contains _private testing
FileRules filename contains # % ~ .bak .old
FileRules title contains testing
FileRules directory contains .htaccess
IgnoreLimit 50 500

Dr Gordon K Smyth           Telephone:  7-3365-3116, Fax:  7-3365-1477
Department of Mathematics, University of Queensland, Q 4072, Australia
E-mail:  gks(at)    WWW:
Received on Wed Apr 1 17:59:14 1998