On Thu, 2002-05-02 at 11:27, Hsiao Ketung Contr 61 CS/SCBN wrote:
> This is intersting.
> There is http://my-intranet-server-name/robots.txt and
> the time stamp of robots.txt is June 1999 , before I took this job.
> I'll have to see what it does and if I can temporarily remove/rename it
> and try to run swishspider again.
You could add a line, or several lines, allowing/disallowing SWISH-E
access to specific URLs. As Bill suggested, the robotstxt.org site
should be rather helpful in explaining it.
> The content of it is:
>
> User-Agent: *
> Disallow: /somedirectory/
> Disallow: /somedirectory/
> ..
Yep, that's probably the problem.
The current spider's User Agent is:
SwishSpider http://swish-e.org
You can probably add these two lines to the top of your robots.txt:
User-Agent: SwishSpider
Disallow:
That will allow SwishSpider access to everything but still block other
bots. You might need to use "SwishSpider*" but probably not.
--
David Norris
Dave's Web - http://www.webaugur.com/dave/
Augury Net - http://augur.homeip.net/
ICQ - 412039
Received on Thu May 2 18:24:27 2002