Re: Disallow in Robots.txt

From: Bill Moseley <moseley(at)>
Date: Mon Jan 08 2007 - 14:46:54 GMT
On Mon, Jan 08, 2007 at 05:51:33AM -0800, James wrote:
> Thanks, Brad!  Yes, I had seen those lines in before.
> I am also wondering where the "2.2" is being generated from (that I see in
> the access logs).  I always see swish-e spider 2.2
> ..I'll be curious to get Bill's response to this, to confirm.  I am not
> confident that this is the total answer, since I always see a whole lot
> written in the access logs from Yahoo, MSN and Google and yet their
> UserAgent is just a one word (short) term to exclude (like the psbot) in the
> Robots.txt.  So, it seems there is more to this.

Good point -- that agent string has not been updated in quite a while.

I guess I would have just tried it and see what happens.  Or look at
the source:

A quick look around shows:

robots.txt is parsed as:

        elsif (/^\s*User-Agent\s*:\s*(.*)/i) {
	    $ua = $1;
	    $ua =~ s/\s+$//;

# Returns TRUE if the given name matches the
# name of this robot
sub is_me {
    my($self, $ua_line) = @_;
    my $me = $self->agent;

    # See whether my short-name is a substring of the
    #  "User-Agent: ..." line that we were passed:
    if(index(lc($me), lc($ua_line)) >= 0) {
      LWP::Debug::debug("\"$ua_line\" applies to \"$me\"")
       if defined &LWP::Debug::debug;
      return 1;
    else {
      LWP::Debug::debug("\"$ua_line\" does not apply to \"$me\"")
       if defined &LWP::Debug::debug;
      return '';

Bill Moseley

Received on Mon Jan 8 06:46:55 2007