Skip to main content.
home | support | download

Back to List Archive

Re: Warning: Substituted possible embedded null

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Mar 04 2002 - 19:08:40 GMT
At 06:34 AM 03/04/02 -0800, HostMaster wrote:
>Warning: Substituted possible embedded null character(s) in file

This is mentioned briefly in the 2.1-dev docs FAQ.

What that probably means is you are trying to index the contents of a
binary file.  The history of that message is this:

Someone reported that some HTML document was not being indexed completely.
It turned out to be that they had an embedded null in the file, so swish
was not indexing past the null.  I first just had a warning that there was
a null, and then later decided to just try and index the entire document
and spit out a warning that a null was found.

>IndexOnly .htm .html .asp
>IndexContents HTML XML

Now, IndexOnly is listed under the config section:

    "Directives for the File Access method only"

so IndexOnly doesn't apply with spidering with the HTTP method.

I think your option is this:

NoContents .jpg .pdf .gif

I'm not sure if this is different from previous behavior, or has always
been this way, or what.  Maybe it's a bug.

To do it right you really need a way to check content-types, not path
names, and spider.pl allows you to do that.


-- 
Bill Moseley
mailto:moseley@hank.org
Received on Mon Mar 4 19:09:10 2002