Skip to main content.
home | support | download

Back to List Archive

Re: Problems with spider.pl on windows 98 SE

From: Adam Edelman <aedelma(at)not-real.tulane.edu>
Date: Wed Feb 13 2002 - 18:41:41 GMT
> Huh?  Why is
>
>  Path-Name: http://arena.internet2.edu:80/sample.htm
>  Content-Length: 33
>  Last-Mtime: 1013569857
>  <HTML>Sample document</HTML>
>
> showing up?  That's stdout from the spider.cgi script that should be
> captured by swish that's running the spider.  You will note that was not
in
> my example.
>

I did just notice that.  I'm curious about how swish reads from the stdout.
I can capture the web documents to be indexed in one file by putting this in
the swish config file:
SwishProgParameters spider.pl>output.txt

Then the file output.txt looks something like this:
Path-Name: http://arena.internet2.edu:80/index.html
Content-Length: 17774
Last-Mtime: 1011279959
<HTML>....code for page here...</HTML>
Path-Name: http://arena.internet2.edu:80/html/contribute.html
Content-Length: 11467
Last-Mtime: 1011279964
<HTML>....more html code here...</HTML>
..etc for all web pages spidered

Would there be some way (function call in swish?) to get swish to read from
output.txt as if it were being directly passed from spider.pl in stdout so
that the effect (multiple web pages indexed) would be the same? Thanks.

Adam
Received on Wed Feb 13 18:42:26 2002