Skip to main content.
home | support | download

Back to List Archive

Re: Does the <!-- Swishcommand noindex --> work whe

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Jun 25 2003 - 18:15:41 GMT
On Wed, Jun 25, 2003 at 12:53:40PM -0500, Cleveland@mail.winnefox.org wrote:
> > Ah, yes.  remove your blank lines.  A blank line separates sections 
> > based on the user agent.  
> 
> Hm. Still not working. I looked at cnn's robots.txt and I noticed they
> didn't have multiple directories listed. Just /directory or file.html,
> not /directory/directory/file.html. Is it ok to put sub folders?

Yes, see: http://www.robotstxt.org/wc/norobots.html


> Also, I
> have the spider only looking at www.oshkoshpubliclibrary.org/citydirs.
> Could that be the problem?

No.  Try this out (changing "site" to be your site).  This is from the 
WWW::RobotRules man page. 

moseley@bumby:~/apache$ cat r.pl

        my $site = 'http://bumby';

        use WWW::RobotRules;
        my $rules = WWW::RobotRules->new('MOMspider/1.0');

        use LWP::Simple qw(get);

        {
          my $url = "$site/robots.txt";
          my $robots_txt = get $url;
          print "==========\n$robots_txt=========\n";
          $rules->parse("$site/robots.txt", $robots_txt) if defined $robots_txt;
        }

my @tests = (
  "$site/citydirs/1857/1857full.pdf",
  "$site/citydirs/1857/1857fullx.pdf",
);

for ( @tests ) {
    print $rules->allowed( $_ ) ? "allowed" : "not allowed";
    print " $_\n";
}


And here's the output:


moseley@bumby:~/apache$ perl r.pl
==========
User-agent: *
Disallow: /citydirs/1857/1857full.pdf
=========
not allowed http://bumby/citydirs/1857/1857full.pdf
allowed http://bumby/citydirs/1857/1857fullx.pdf





-- 
Bill Moseley
moseley@hank.org
Received on Wed Jun 25 18:15:47 2003