Skip to main content.
home | support | download

Back to List Archive

Re: Indexing problem os -S Prog option

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri May 10 2002 - 17:05:03 GMT
At 05:48 AM 05/10/02 -0700, Cristiano Corsani wrote:
>Running it in swish by means of -S Prog option works for a while and stop 
>for with error: "External program failed to return required headers 
>Path-Name: & Content-Len". The problem is not an incorrect Header + XML2 
>building because indexing the record that cause the crash works fine.

..

>I saw what record seems to crash the out stream ... trying to re-index 
>beginning from some records before does not crash anymore on the same 
>record but after some hundereds of thousands record later ... it does not 
>depend on one record. It seems depend on my program that fails for some 
>reason on building the out stream or on swish for some reason.

Hum, well if you can't bracket the the problem to a few files then it's
hard to say what is happening.  Every time I've seen this in the past it's
been because an extra \n or some such char was placed in the input and not
counted in the content-length setting.  But I have not indexed that many
files before.

I have found it sometimes confusing to see what file was causing the error
due to buffering of the output -- stderr reports the error, but stdout is
buffered so it's hard to match the file with the error.  I think extprog.c
should buffer the previous file name and report that, as that would give
better info.  I'll add that next week.

Here's the code:

        if (strlen(line) == 0) /* blank line indicates body */
        {
            if (!fsize || !real_path)
                progerr("External program failed to return required headers
                                  Path-Name: & Content-Length:");

-S prog works by reading headers until it finds a blank line.  When the
blank line is found (as you can see above) it then makes sure that both a
path name header and a content length header was set.  It then reads
content-length bytes from the input stream.  So if the content-length
header from the *previous* file was wrong, say by one char, then extprog.c
will start reading the *next* file off by one char, and then not read the
headers correctly.  That would be my guess.

Is it possible that your java program is counting characters differently
than swish does?



-- 
Bill Moseley
mailto:moseley@hank.org
Received on Fri May 10 17:06:25 2002