Skip to main content.
home | support | download

Back to List Archive

External program failed to return required headers Path-Name: & Content-Length:

From: Nuno Ferreira <nuno.ferreira(at)>
Date: Mon Mar 31 2003 - 13:38:05 GMT

I'm using to perform a web spidering and pass info to swish-e
to index the contents of a few sites as well as some metatags.

I'm using SWISH-E 2.2.3.

Here is my 'spider_config':
@servers = (
        skip        => 0, 
        base_url    => '',
        same_hosts  => [],
        agent       => 'swish-e spider',
        email       => '',
	use_md5		=> 1,
        test_url    => sub { $_[0]->path =~ /catalog/ },
	test_response => sub {
			my $content_type = $_[2]->content_type;
			return $content_type =~ m!text/html!;
			#my $ok = grep { $_ eq $content_type } qw{
text/html text/plain };
			#return 1 if $ok;
        delay_min   => 0.01, 
        keep_alive  => 1,     

And here is my 'swish-e.conf':
IndexReport 3
IndexFile /usr/local/swish/somesite.dat
IndexDir /usr/local/bin/
MetaNames descricao sku keywords nomeproduto
SwishProgParameters /usr/local/swish/spider_config

I start the spidering/indexing like this:
# swish-e -c /path/to/swish-e.conf -S prog

It starts and it looks like it is doing everything I want, then it
suddenly crashes with:
Looking at extracted tag '<td background="/images/verao_foo_d.jpg">'
! Found 0 links in
e68425e6454bb4e11d - Using DEFAULT (HTML2) parser -  (565 words)
err: External program failed to return required headers Path-Name: &

It always crashes in the same place. If I spider a different site, it
crashes also and always in the same place.
I've found this thread <> that is
related to my problem but after reading it, I became even more confused
because now I know that I may be looking at the wrong debug line because
of the beffering issues.

Can anyone explain what is happening and, hopefully, post a solution.

Received on Mon Mar 31 13:42:07 2003