This is intersting.
There is http://my-intranet-server-name/robots.txt and
the time stamp of robots.txt is June 1999 , before I took this job.
I'll have to see what it does and if I can temporarily remove/rename it
and try to run swishspider again.
The content of it is:
What does robots.txt does and
what's your suggestion ?
From: David L Norris [mailto:email@example.com]
Sent: Wednesday, May 01, 2002 10:36 PM
To: Multiple recipients of list
Subject: [SWISH-E] RE: HTTP Crawler
On Wed, 2002-05-01 at 18:37, Hsiao Ketung Contr 61 CS/SCBN wrote:
> But if I run the following (from src directory)
> ./swishspider . http://my-intranet-server-name/tmp.html.
Is there a robot control file blocking the URL?
Dave's Web - http://www.webaugur.com/dave/
Augury Net - http://augur.homeip.net/
ICQ - 412039
Received on Thu May 2 16:27:45 2002