Skip to main content.
home | support | download

Back to List Archive

Re: spidering with swish

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Jan 05 2005 - 21:03:41 GMT
On Wed, Jan 05, 2005 at 12:15:48PM -0800, Lance Perry wrote:
> I am spidering a site (spidering is being called from the swish indexing).
> 
> The site contains .exe and .zip files. I DO NOT want those files to be
> indexed (or even downloaded).

You do it the same way as the example in the spider.pl docs for skipping .gif,
jpeg and .png, but specify \.exe and \.zip instead or use robots.txt
to list the files.

> 
> --robots.txt--
> User-agent: *
> Disallow: /downloads/cisco-vpn/*.exe$

That's not valid robots.txt syntax.  You can't use regex patterns.

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Wed Jan 5 13:03:47 2005