Skip to main content.
home | support | download

Back to List Archive

Re: Spidering on Windows

From: Ron Klatchko <ron(at)not-real.ckm.ucsf.edu>
Date: Fri Oct 30 1998 - 17:14:12 GMT
David Norris wrote:
> It works fine.  However, I am having a weird little problem.  I don't think
> it is related to my changes.  The problem is 'URL disallowed by server' when
> it encounters a link to another location on my server.  It seems that the
> only URL allowed is http://myserver , even http://myserver/ is not allowed,
> neither is http://myserver/index.html

Swish behaves like a good spider and obeys robots.txt.  Looking at
yours:

> telnet illusionary.dyn.ml.org 80
Trying 207.40.214.2...
Connected to illusionary.dyn.ml.org.
Escape character is '^]'.
GET /robots.txt HTTP/1.0
HOST: illusionary.dyn.ml.org

HTTP/1.1 200 OK
[headers cut...]
Content-Type: text/plain

User-agent: *
Disallow: /
Allow /tomahawk

Hmm, I wonder what it could be...

moo
----------------------------------------------------------------------
          Ron Klatchko - Manager, Advanced Technology Group           
           UCSF Library and Center for Knowledge Management           
                           ron@ckm.ucsf.edu
Received on Fri Oct 30 09:24:28 1998