Skip to main content.
home | support | download

Back to List Archive

RE: External program failed to return required headers

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Mar 31 2003 - 21:24:48 GMT
qOn Mon, 31 Mar 2003, Nuno Ferreira wrote:

> I am not the sysadmin of the remote sites. I'll try to speak to them.
> I can test any patch that you want me to try.

You can just make a local copy of spider.pl, so you shouldn't need the
help of the sysadmin.

Then, as long as you are running something like Perl 5.6.1 or newer look
in spider.pl for:

    my $headers = join "\n",
        'Path-Name: ' .  $uri,
        'Content-Length: ' . length $$content,
        '';

and replace it with something like:

    my $doc_length = do { use bytes; length $$content };

    my $headers = join "\n",
        'Path-Name: ' .  $uri,
        'Content-Length: $doc_length',
        '';

I suppose you might even be able to just place:

   use bytes;

toward the top of spider.pl and it would work, too.  But there might be
some other side-effects so the above might be a safer fix for now.








> 
> Regards,
> Nuno
> 
> > -----Original Message-----
> > From: Bill Moseley [mailto:moseley@hank.org] 
> > Sent: segunda-feira, 31 de Março de 2003 15:35
> > To: Nuno Ferreira
> > Cc: Multiple recipients of list
> > Subject: Re: [SWISH-E] External program failed to return 
> > required headers Path-Name: & Content-Length:
> > 
> > 
> > On Mon, 31 Mar 2003, Nuno Ferreira wrote:
> > 
> > > It starts and it looks like it is doing everything I want, then it
> > > suddenly crashes with:
> > > <SNIP>
> > > Looking at extracted tag '<td background="/images/verao_foo_d.jpg">'
> > > ! Found 0 links in
> > > 
> > http://www.somesite.com/catalog/formas.php?PHPSESSID=85c724f87
> > fc7f0e6842
> > > 5e6454bb4e11d
> > > 
> > http://www.somesite.com/catalog/detras_loja.php?PHPSESSID=85c7
> > 24f87fc7f0
> > > e68425e6454bb4e11d - Using DEFAULT (HTML2) parser -  (565 words)
> > > err: External program failed to return required headers Path-Name: &
> > > Content-Length:
> > > .
> > > </SNIP>
> > > 
> > > It always crashes in the same place. If I spider a 
> > different site, it
> > > crashes also and always in the same place.
> > > I've found this thread 
> > <http://swish-e.org/archive/3817.html> that is
> > > related to my problem but after reading it, I became even 
> > more confused
> > > because now I know that I may be looking at the wrong debug 
> > line because
> > > of the beffering issues.
> > 
> > First, see if this if a possible fix:
> > 
>   http://swish-e.org/archive/4870.html
> 
> 
> If you set debug => DEBUG_URL then it will display the URLs as they are
> fetched and before swish gets the document.  That should help find the
> exact document where the problem is happening.
> 
> But that error "failed to return required headers" is likely due to the
> *previous* document returning the wrong content length.  The way extprog
> works is it reads line-by-line to read the headers.  Then when it sees a
> blank line (that marks the end of the headers) it reads content-length
> bytes in from the external program and starts over.
> 
> If that content length was short one byte, and last byte of the doc is a
> \n then when it starts to read the next doc it will see just \n and
> assume
> that's the end of the headers.  But at that point no Content-Length or
> Path-Name header is set so the program aborts with that error.
> 
> I suspect what is happening is that previous document has a wide char
> and
> forcing perl into UTF-8 encoding.  spider.pl is using "length" to
> determine the length of the string, but that's the character lenght not
> the byte length:
> 
> $ perl -MDevel::Peek -e '$x=chr(400);Dump($x);print "len=", length$x,
> "\n"'
> SV = PV(0x80f6344) at 0x80fd2a4
>   REFCNT = 1
>   FLAGS = (POK,pPOK,UTF8)
>   PV = 0x80f9e58 "\306\220"\0
>   CUR = 2
>   LEN = 3
> len=1
> 
> So the length of the string is two bytes, but "length" is returning one.
> That would result in your problem.
> 
> I need to find a portable way for use with all versions of Perl to read
> the correct byte length.
> 
> 
> 

-- 
Bill Moseley moseley@hank.org
Received on Mon Mar 31 21:25:46 2003