Skip to main content.
home | support | download

Back to List Archive

RE: HTTP Crawler

From: Hsiao Ketung Contr 61 CS/SCBN <KETUNG.HSIAO(at)not-real.LOSANGELES.AF.MIL>
Date: Thu May 02 2002 - 16:27:39 GMT
David,

This is intersting.
There is   http://my-intranet-server-name/robots.txt and
the time stamp of robots.txt is June 1999 , before I took this job.
I'll have to see what it does and if I can temporarily remove/rename it
and try to run swishspider again.

The content of it is:

User-Agent: *
Disallow: /somedirectory/
Disallow: /somedirectory/
..

What does robots.txt does and 
what's your suggestion ?
Thanks.

 	Ketung Hsiao
 	Web Admin/Developer
 	310-363-6771
 

-----Original Message-----
From: David L Norris [mailto:dave@webaugur.com]
Sent: Wednesday, May 01, 2002 10:36 PM
To: Multiple recipients of list
Subject: [SWISH-E] RE: HTTP Crawler


On Wed, 2002-05-01 at 18:37, Hsiao Ketung Contr 61 CS/SCBN wrote:
> But if I run the following (from src directory)
> ./swishspider . http://my-intranet-server-name/tmp.html.

Is there a robot control file blocking the URL?
  http://my-intranet-server-name/robots.txt

-- 
 David Norris
  Dave's Web - http://www.webaugur.com/dave/
  Augury Net - http://augur.homeip.net/
  ICQ - 412039
Received on Thu May 2 16:27:45 2002