Skip to main content.
home | support | download

Back to List Archive

External program failed to return required headers Path-Name: & Content-Length:

From: Nuno Ferreira <nuno.ferreira(at)not-real.globalti.pt>
Date: Mon Mar 31 2003 - 13:38:05 GMT
Hi,

I'm using spider.pl to perform a web spidering and pass info to swish-e
to index the contents of a few sites as well as some metatags.

I'm using SWISH-E 2.2.3.

Here is my 'spider_config':
@servers = (
    {
        skip        => 0, 
	debug	=>	DEBUG_INFO | DEBUG_SKIPPED | DEBUG_LINKS |
DEBUG_FAILED | DEBUG_HEADERS,        
        base_url    => 'http://www.somesite.com/catalog',
        same_hosts  => [],
        agent       => 'swish-e spider http://swish-e.org/',
        email       => 'nuno.ferreira@globalti.pt',
	use_md5		=> 1,
        test_url    => sub { $_[0]->path =~ /catalog/ },
	test_response => sub {
			my $content_type = $_[2]->content_type;
			return $content_type =~ m!text/html!;
			#my $ok = grep { $_ eq $content_type } qw{
text/html text/plain };
			#return 1 if $ok;
			#return;
	},
        delay_min   => 0.01, 
        keep_alive  => 1,     
    }
);    

And here is my 'swish-e.conf':
IndexReport 3
IndexFile /usr/local/swish/somesite.dat
IndexDir /usr/local/bin/spider.pl
MetaNames descricao sku keywords nomeproduto
SwishProgParameters /usr/local/swish/spider_config

I start the spidering/indexing like this:
# swish-e -c /path/to/swish-e.conf -S prog

It starts and it looks like it is doing everything I want, then it
suddenly crashes with:
<SNIP>
Looking at extracted tag '<td background="/images/verao_foo_d.jpg">'
! Found 0 links in
http://www.somesite.com/catalog/formas.php?PHPSESSID=85c724f87fc7f0e6842
5e6454bb4e11d
http://www.somesite.com/catalog/detras_loja.php?PHPSESSID=85c724f87fc7f0
e68425e6454bb4e11d - Using DEFAULT (HTML2) parser -  (565 words)
err: External program failed to return required headers Path-Name: &
Content-Length:
.
</SNIP>

It always crashes in the same place. If I spider a different site, it
crashes also and always in the same place.
I've found this thread <http://swish-e.org/archive/3817.html> that is
related to my problem but after reading it, I became even more confused
because now I know that I may be looking at the wrong debug line because
of the beffering issues.

Can anyone explain what is happening and, hopefully, post a solution.

TIA,
Nuno
Received on Mon Mar 31 13:42:07 2003