Skip to main content.
home | support | download

Back to List Archive

Re: "External program failed to return required headers"

From: john cooper <john(at)not-real.wrenhill.com>
Date: Wed Sep 03 2003 - 21:27:24 GMT
Hi,

We had the same problem on RH9, the solution was to do this:

LANG=en_GB
export LANG

in a shell and then run it again.... worked fine for me. (but then I'm 
in England)

The discussion was here:  http://swish-e.org/archive/4870.html

Cheers

John



Bill Moseley wrote:

>On Wed, Sep 03, 2003 at 01:02:53PM -0700, Thomas Dowling wrote:
>
>  
>
>>I am trying to use SWISH-E (I've tried both 2.2.3 and 2.4.0  pr1)  to 
>>spider our website.  Following directions in the documentation, I set up 
>>a basic swish.conf and spider.conf, and my indexing run always bombs 
>>with the message:
>>
>>err: External program failed to return required headers Path-Name: & 
>>Content-Length:
>>
>>I found what appeared to be an identical problem report in the list 
>>archives from last April (<http://swish-e.org/archive/5149.html>), but 
>>didn't see a definitive solution posted there.  None of the suggestions 
>>offered there affect the problem here.
>>    
>>
>
>That error message is typically because the length is set wrong on the
>*previous* document and then when swish-e tries to read the document
>it's reading in the wrong place in the stream.
>
>  
>
>>I took the liberty of inserting a line into spider.pl to print out the 
>>headers, and every document it reports on does have Path-Name and 
>>Content-Length headers, which makes me suspect the problem is either 
>>with swish-e itself or in the interaction between spider.pl and swish-e.
>>    
>>
>
>I often do things the hard way.  For example, I've taken the output from 
>spider.pl to a file, then one-by-one extract out each document and 
>verify that its content-length is indeed its byte length.
>
>The problem is (may be depending on the version of Perl and the LANG 
>setting) that spider.pl uses length() to set the content-length 
>header, but for multi-byte chars (which swish-e won't support) the 
>length() and the size of the data can be two different things.  So I 
>have also edited spider.pl, and where it grabs the length() I have 
>written out the file to disk and then stat'ed the check if the length is 
>the same as the file size.
>
>  
>
>>I've tried this against multiple web sites.  The number of files scanned 
>>before the indexing run dies varies from site to site, but is consistent 
>>on each site.  FWIW, I'm running swish-e under RedHat 8.0 with Perl 
>>5.8.0 (and, if I'm reading things correctly, LWP 5.65).
>>    
>>
>
>I think it was RedHat 9 where the default LANG is UTF-8.  There have 
>been problems reported in this case.  I'm not sure if it applies to RH 
>8.0.
>
>Assuming that this is a multi-byte character problem:
>
>There's is some code in spider.pl's output_content() function 
>that was suppose to fix this:
>
>    # ugly and maybe expensive, but perhaps more portable than "use bytes"
>    my $bytecount = length pack 'C0a*', $$content;
>
>$ perl -le '$x=chr(400); print length pack "C0a*", $x'
>2
>
>Here's with "use bytes;" pragma.
>
>$ perl -le '$x=chr(400); print length $x'
>1
>
>$ perl -le '$x=chr(400); use bytes; print length $x'
>2
>
>
>
>  
>
Received on Wed Sep 3 21:27:43 2003