At 06:34 AM 03/04/02 -0800, HostMaster wrote:
>Warning: Substituted possible embedded null character(s) in file
This is mentioned briefly in the 2.1-dev docs FAQ.
What that probably means is you are trying to index the contents of a
binary file. The history of that message is this:
Someone reported that some HTML document was not being indexed completely.
It turned out to be that they had an embedded null in the file, so swish
was not indexing past the null. I first just had a warning that there was
a null, and then later decided to just try and index the entire document
and spit out a warning that a null was found.
>IndexOnly .htm .html .asp
>IndexContents HTML XML
Now, IndexOnly is listed under the config section:
"Directives for the File Access method only"
so IndexOnly doesn't apply with spidering with the HTTP method.
I think your option is this:
NoContents .jpg .pdf .gif
I'm not sure if this is different from previous behavior, or has always
been this way, or what. Maybe it's a bug.
To do it right you really need a way to check content-types, not path
names, and spider.pl allows you to do that.
Received on Mon Mar 4 19:09:10 2002