Thanks, Brad! Yes, I had seen those lines in SwishSpiderConfig.pl before.
I am also wondering where the "2.2" is being generated from (that I see in
the access logs). I always see swish-e spider 2.2 http://swish-e.org.
..I'll be curious to get Bill's response to this, to confirm. I am not
confident that this is the total answer, since I always see a whole lot
written in the access logs from Yahoo, MSN and Google and yet their
UserAgent is just a one word (short) term to exclude (like the psbot) in the
Robots.txt. So, it seems there is more to this.
On 1/8/07, Brad Miele wrote:
>
> Pretty sure you can set agent in SwishSpiderConfig.pl, yep, line 143:
>
> agent => 'swish-e spider http://swish-e.org/'
>
> regards,
>
> Brad
> ---------------------
> Brad Miele
> VP Technology
>
> On Mon, 8 Jan 2007, James wrote:
>
> > Is there a way for other web-masters to disallow Swish-e from crawling
> their
> > site(s) and is there a way to declare what bot I am? For instance, I
> always
> > put the following in my robots.txt files for my web-sites:
> >
> > User-agent: psbot
> > Disallow: /
> >
> > Is there some kind of configuration file that declares what bot
> (User-agent)
> > I am (when using Swish-e) and can that be changed to something I
> customize
> > and something I can declare publicly so that anyone can disallow my user
> > agent?
> >
> > I ask these things in general because I know that Swish-e has a polite
> > spider, obeying Robots.txt and noindex, nofollow directives.
> >
>
*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Mon Jan 8 05:52:55 2007