David,
This is intersting.
There is http://my-intranet-server-name/robots.txt and
the time stamp of robots.txt is June 1999 , before I took this job.
I'll have to see what it does and if I can temporarily remove/rename it
and try to run swishspider again.
The content of it is:
User-Agent: *
Disallow: /somedirectory/
Disallow: /somedirectory/
..
What does robots.txt does and
what's your suggestion ?
Thanks.
Ketung Hsiao
Web Admin/Developer
310-363-6771
-----Original Message-----
From: David L Norris [mailto:dave@webaugur.com]
Sent: Wednesday, May 01, 2002 10:36 PM
To: Multiple recipients of list
Subject: [SWISH-E] RE: HTTP Crawler
On Wed, 2002-05-01 at 18:37, Hsiao Ketung Contr 61 CS/SCBN wrote:
> But if I run the following (from src directory)
> ./swishspider . http://my-intranet-server-name/tmp.html.
Is there a robot control file blocking the URL?
http://my-intranet-server-name/robots.txt
--
David Norris
Dave's Web - http://www.webaugur.com/dave/
Augury Net - http://augur.homeip.net/
ICQ - 412039
Received on Thu May 2 16:27:45 2002