Hi, folks,
I've just blanked out robots.txt on root directory of my intranet server and
tried ./swishspider again and I get 500 in ..response.
(Internal Error 500
The server encountered an unexpected condition which prevented it from
fulfilling the request.).
I think it's because I'm running swishspider from our internet server which
is outside
the firewall and of course I can't get thru the fireware to our intranet.
I had a feeling there is no way to go around that.
I'll just have to have the swish-e install on our intranet server.
Please let me know if I'm wrong.
Thanks for all the response.
-----Original Message-----
From: Bill Moseley [mailto:moseley@hank.org]
Sent: Thursday, May 02, 2002 9:51 AM
To: Multiple recipients of list
Subject: [SWISH-E] RE: HTTP Crawler
At 09:27 AM 05/02/02 -0700, Hsiao Ketung Contr 61 CS/SCBN wrote:
>User-Agent: *
>Disallow: /somedirectory/
>Disallow: /somedirectory/
>..
>
>What does robots.txt does and
>what's your suggestion ?
Google is your friend.
http://www.robotstxt.org/wc/robots.html
If you were to use -S prog with spider.pl you can tell it to ignore
robots.txt. But, I'd suggest you try to get -S http method working first
before trying to tackle the -S prog / spider.pl setup with swish.
--
Bill Moseley
mailto:moseley@hank.org
Received on Thu May 2 18:24:10 2002