Skip to main content.
home | support | download

Back to List Archive

Re: Any way to restrict URLs with http method since you can't use FileRules?

From: Mark Gaulin <gaulin(at)not-real.globalspec.com>
Date: Thu Apr 29 1999 - 21:06:39 GMT
I think the current method is to configure a robots.txt file in the web
site's 
main directory (http://www.website.com/robots.txt). Search the web (or the
swish docs, possibly) for "robot exclusion" to get started.
Hope that helps.

Wish list:
Personally I would like to have a little more flexibility than that, since
I want to create multiple indexes of the same site, each with different 
sets of pages that are off-limits.  Even better would be a way to specify
pages that should be crawled but not indexed (so that deeper pages can
be discovered) based on regex's.  So there would be a set of Crawl-Stopping
regexs and a set of DontIndex regexs so it could, say, index every page
on the site with the word "help" or "index" in the url.
Being able to specify multiple starting urls would be cool too, especially
if they
could be stored in a file external to the config file (in addition to
specifying
multiple urls in the config file).

	Mark

At 01:49 PM 4/29/99 -0700, john.leth@gulfaero.com wrote:
>Any way to restrict URLs when using http method since you can't use
>FileRules Directive?
>
>I'd like to be able to stop indexing of certian areas. My guess is that
>I will have to modify
>the swishspider perl program. Has anyone else done this already.
>
>-John Leth-Nissen
>Gulfstream Aerospace Corp.
> 
Received on Thu Apr 29 14:05:03 1999