From: Matt Kynaston <Matt.Kynaston(at)not-real.etbroker.com>
Date: Wed Sep 25 2002 - 15:48:03 GMT
Hi there,

I've just upgraded from 2.2rc1 to 2.2 and have run into problems using
spider.pl. Platform is Windows2000/IIS, ActivePerl 5.6.1, Windows binary
version of swish-e 2.2.

When I run swish-e -S prog -c \path\to\config it launches the spider.pl
program fine. The same command worked perfectly in rc1, but now I get the
following errors:

Warning: Unknown header line: '<html>' from program prog-bin/spider.pl
Warning: Unknown header line: '<title>Untitled</title>' from program
prog-bin
/spider.pl
Warning: Unknown header line: '<meta http-equiv="Content-Type"
content="text/htm
l;">' from program prog-bin/spider.pl
Warning: Unknown header line: '<!-- don't let swish index contents of this
page,
Warning: Unknown header line: '<meta name="robots" content="nocontents">'
from p
rogram prog-bin/spider.pl
type
="text/css">' from program prog-bin/spider.pl
err: External program failed to return required headers Path-Name: &
Content-Len
gth:
.

Clearly it's not recognising the headers it returns. So I had a look at what
spider.pl is outputting - the headers are all there:

Path-Name: http://dev.ecostas.com/swish.php
Content-Length: 29538
No-Contents: 1

<html>
[snip]

After searching the archives I found a note about binmode STOUT to force
unix-style new lines, and tried adding that. spider.pl does what it should
(\n newlines), but swish-e still doesn't like it.

Then I tried running it with -T PROPERTIES to see what it thinks it's
getting, and find just before all the warnings above:
swishdocpath: 6 ( 32) S: "http://dev.ecostas.com/swish.php"
swishdocsize: 8 (  4) N: "0000000029538"

So them headers are parsed?! Why is it saying it hasn't??

OK, ignore that for a sec. It also seems like it's not recognising the blank
line at the end of the headers - though it does recognise the line that
appears at after </head>. Pop out the hex editor and had a look. Directly
after No-Contents: 1 I get OA OA - \n\n. Directly after </head> I get 0D 0A
0D 09 0D 0A - \r\n\t\r\n.

I've had a look at extprog.c to see what might be going on, but am stumped.

Is anyone else having these problems? Are the headers parsed or not? Why
isn't it recognising the end of the header section? Or am I doing something
totally stupid (other than using Windows ;)?

Cheers, and thanks for the (otherwise) great software,

Matt

*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************

Received on Wed Sep 25 15:51:33 2002