Hi,
Swish-e no longer wants to update the index after having run spider.pl.
It ran perfectly for more than two years now, but has started to abort
and spew out an error I initially had when I started using it.
On Linux .
I first run on command line:
/usr/local/lib/swish-e/spider.pl > output.txt
and then
swish-e -c swish.conf -S prog -i stdin < output.txt
but, this aborts after awhile with
Warning: Unknown header line: 'om/linking/' from program stdin
err: External program failed to return required headers Path-Name:
I have tried all these 3 options individually in spider.pl
my $bytecount = length pack 'C0a*', $$content;
my $bytecount = length($$content);
use bytes;
$bytecount = length $$content;
and get the same result.
If I look at the output.txt file, I can see that some of the entries
don't have "Path-Name" on a line on its own, but instead is sitting next
to the closing </html> tag of the previous entry.
eg.
<!-- InstanceEnd -->
</html>Path-Name: http://www.site.com/index.htm
and not like
<!-- InstanceEnd -->
</html>
Path-Name: http://www.site.com/index.htm
Is it doing this because some of the pages don't end off with a new
line, or has this got to do with page encoding or this multi-byte issue,
I've seen mentioned.
As nothing has been changed on the server, it must be an issue with some
of the web pages?
Am stuck - please help. Thanks
_______________________________________________
Users mailing list
Users(at)not-real.lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Oct 07 2011 - 13:29:37 GMT