Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] err: External program failed to return required headers Path-Name (Swish-e 2.4.5)

From: <Rene.Kloos(at)not-real.esa.int>
Date: Thu Mar 29 2007 - 13:17:05 GMT
Hello Clint,

Although Bill Moseley can give you the most accurate answer, I can already
say that I stumbled into the same problem at one point. In your setup the
spider creates an output file which is used as an input file for Swish-e to
index. This file requires several headers to be present for every spidered
page, e.g. path name and content-length. The content-length value is taken by
Swish-e to read in the next <content-length> characters. The fact that your
warning contains 'h-Name' shows that Swish-e reads in 3 characters too many,
i.e. 'Pat', so Swish-e doesn't find the next 'Path-Name' header where it
expects to find it. This means that the value listed in the content-length
header is not in accordance with the actual content-length.

I guess this has to do with the UTF-8/Latin-1 issue when using libxml2, but I
am certainly no expert in that area :-)

In one of the posts it is suggested to modify the spider.pl:

my $bytecount = length pack ‘c0a*’, $$content;

should become:

my $bytecount =  do { use bytes; length( $$content) };

This did actually NOT do the trick for me. The following DID:

my $bytecount = length($$content);

I have been happy indexing ever since (static pages, not dynamic ones).

Hope this helps,
René

users-bounces@lists.swish-e.org wrote on 29/03/2007 12:09:32:

> SWISH-E 2.4.5
>
> Linux 2.6.9-42.0.8.ELsmp #1 SMP Tue Jan 23 13:01:26 EST 2007 i686 i686
> i386 GNU/Linux
>
>
> I initially indexed only static pages, which worked fine. However it has
> become necessary to index the database driven pages as well.
>
> I setup spider.pl and got as far as having it generate the output.
> txt file which is
> around 40MB+, using  /usr/local/lib/swish-e/spider.pl default http:
> //my_server.com/index.html > output.txt
> No errors were reported.
>
> But now when I run
> swish-e -c config -S prog -i stdin < output.txt
>
> I get this fatal error soon after
>
> Warning: Unknown header line: 'h-Name: http://www.xxx.xxx/xx.htm'
> from program spider.pl
> err: External program failed to return required headers Path-Name:.
>
> I have looked up this error, but the posts are from 2003-2005 and
> although explain
> possible reasons why this is happening, don't really show how to
> fix, or workaround this error.
>
> I'm only indexing html text files and text from dynamic pages, not
> images, pdfs or anything like that.
>
> How does one fix this?
>
> Regards
> Clint
>
>
>
>
>
> _______________________________________________
> Users mailing list
> Users@lists.swish-e.org
> http://lists.swish-e.org/listinfo/users
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Mar 29 09:17:25 2007