Skip to main content.
home | support | download

Back to List Archive

Re: Narrowing http spidering

From: Ron Klatchko <ron(at)not-real.ckm.ucsf.edu>
Date: Mon Nov 16 1998 - 17:32:48 GMT
Alexandre Gefen wrote:
> With the http option, config file only allows to limit the number of links
> to be followed by the spider.
> Does someone has an idea of how to narrow spidering to a subdirectory
> (http://myserveur/mydirectory/...) and indexing all the eventual
> subdirectories (http://myserveur/mydirectory/sub/sub/sub/example.html) but
> never going out to the main server (http://myserver) ? if there is any link
> in this subdirectory to any file on http://myserveur/, this link will be
> followed, which is very ennoying. Is it some way to hack the C source or
> Perl libraries used for spidering for this purpose ?
> Also : is it some way to customize the files that will be scanned (by
> extension, like in the file option) ?

If it's your own site, you can use robots.txt

moo
----------------------------------------------------------------------
          Ron Klatchko - Manager, Advanced Technology Group           
           UCSF Library and Center for Knowledge Management           
                           ron@ckm.ucsf.edu
Received on Mon Nov 16 09:34:05 1998