Re: [swish-e] IndexOnly and NoContents, and indexing image files

From: Bill Moseley <moseley(at)>
Date: Fri Mar 16 2007 - 06:08:43 GMT
On Fri, Mar 16, 2007 at 05:25:53PM +1300, Lesley Walker wrote:
> However, if we do that, we get a zillion warning messages saying that 
> null characters have been been substituted with newlines in .gif files.  
> This would seem to indicate that Swish-e is attempting (rather 
> uselessly) to index the content of the .gif files.

That's a bug, I suspect.

> > IndexOnly .html .htm .txt .cnt .gif .shm .xbm .au .mov .mpg  .doc .pdf
> > NoContents .gif .xbm .au .mov .mpg

You have to tell swish that it is not a HTML file -- and it defaults
to assume that everything you are indexing is HTML.

Try adding:

    IndexContents TXT .gif .xbm .au .mov .mpg

You will still get the warning about the embedded null chars.
I suspect you could get the swish-e source and modify file.c to not
do that substitution if the file is flagged index_no_contents.

That code to substitute nulls was problematic from the start many
years ago.  Maybe this is all it takes:

Index: src/file.c
--- src/file.c  (revision 1899)
+++ src/file.c  (working copy)
@@ -279,7 +279,7 @@
         /* JFP - substitute null chars, VFC record may have null char in reclen word, try to discard them */
-        if ( is_text && strlen( (char *)buffer ) < bytes_read )
+        if ( !fprop->index_no_content && is_text && strlen( (char *)buffer ) < bytes_read )
             int i;
             int j = 0;

Do you really need to index your file names?  It's not a feature that
seems to be used very often.

Bill Moseley

