Skip to main content.
home | support | download

Back to List Archive

Re: Quick question

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Apr 12 2002 - 21:22:18 GMT
At 01:34 PM 04/12/02 -0400, Tim Cantin wrote:
>> Oh, don't use merge, just index everything at once, or search multiple
>> index files.
>
>How do I do that; in the .config file?

Hi Tim,

You just search ./swish-e -w foo -f index1 index2 index3 index4

>Thu Apr 11 14:13:20 EDT 2002
>Indexing Data Source: "File-System"
>Indexing "/www/data/"
>
>Warning: Substituted possible embedded null character(s) in file
'/www/data/Computing/Y2Kfacstaff.html'

Hum.  Is it possible that those are not really .html files?

That's telling you that the OS said the file was X bytes long.  That X
bytes was loaded into a buffer in swish, but then strlen(), which uses \0
to mark the end of the string, returned a values less than X.  Swish then
went though and changed the nulls to some other char (space or a \n, I
can't remember).

>err: Buffer too short in coalesce_word_locations. Increase
COALESCE_BUFFER_MAX_SIZE in config.h and rebuild.

I'm not 100% sure, but that probably means you are indexing very, very long
docs (or binary files).  My guess by looking at both errors that you are
indexing something other than what you think you are indexing.

David Norris had this problem the other day trying to index books.  The
recommendation was to index smaller chunks -- as that would result in
better searches anyway.


-- 
Bill Moseley
mailto:moseley@hank.org
Received on Fri Apr 12 21:23:47 2002