Skip to main content.
home | support | download

Back to List Archive

Re: robots.txt

From: J Robinson <jrobinson852(at)not-real.yahoo.com>
Date: Mon Oct 31 2005 - 14:51:02 GMT
Thanks, Bill;

The actual complaint is that the spider is indexing
pages it shouldn't.

I'll check out the 'skipped' debug flag -- is there
another that actually shows urls being compared
against the robots.txt contents?

Thanks again
  jrobinson

--- Bill Moseley <moseley@hank.org> wrote:

> On Mon, Oct 31, 2005 at 06:34:59AM -0800, J Robinson
> wrote:
> > Any tips on how I can debug this? Is there a debug
> > flag for spider.pl that shows robots.txt being
> parsed
> > and/or urls being matched against it, or anything
> like
> > that?
> 
> set the debug to "skipped" and it will tell you when
> a file is skipped
> due to robots.txt.
> 
> Then just run the spider on one file they say it's
> skiping.
> 
> When I've debugged this in the past I found that the
> robots.txt file was
> not setup correctly.
> 
> -- 
> Bill Moseley
> moseley@hank.org
> 
> Unsubscribe from or help with the swish-e list: 
>    http://swish-e.org/Discussion/
> 
> Help with Swish-e:
>    http://swish-e.org/current/docs
>    swish-e@sunsite.berkeley.edu
> 
> 



		
__________________________________ 
Yahoo! FareChase: Search multiple travel sites in one click.
http://farechase.yahoo.com
Received on Mon Oct 31 06:51:03 2005