Skip to main content.
home | support | download

Back to List Archive

Change to spider.pl to support Basic authentication

From: Darryl Friesen <Darryl.Friesen(at)not-real.usask.ca>
Date: Sat Feb 02 2002 - 07:06:22 GMT
I looked through the archives and docs a bit (granted, not very thoroughly)
and didn't find too much on spidering sites protected by Basic
authentication, so I made this 2 line change to the spider.pl program that
came with SWISH-E 2.1-dev-25.  (sorry about the long lines here)

[moondog] diff spider.orig.pl spider.pl
268a269
>     $request->authorization_basic( split(':', $server->{credentials}, 2) )
if ($server->{credentials});
661a663
>         $request->authorization_basic( split(':', $server->{credentials},
2) )  if ($server->{credentials});


In order for this to work you need to be using swish-e something like this

    ./swish-e -c swish.config -S prog

with a swish.conf that looks a bit like:

    IndexDir ./spider.pl
    SwishProgParameters ./spider.config

and you need to add an extra config option to your spider.config file like
this:

    credentials => 'username:password'

In case your wondering, yes, I made that up.  You won't find it in the docs
or FAQ.  And yes, it's a bit insecure, but with the right file permissions
on your config files, it shouldn't be the end of the world.

I offer this up to whomever wants it.  It seems to work fine here for me,
but I haven't tested it thoroughly.  There are likely other, and possibly
more secure, ways to accomplish this (and I wouldn't be opposed to hearing
them), but hey, it's 12:30am here so it's the best I can do right now.  :)


- Darryl

 ----------------------------------------------------------------------
  Darryl Friesen, B.Sc., Programmer/Analyst    Darryl.Friesen@usask.ca
  Education & Research Technology Services,     http://gollum.usask.ca/
  Department of Computing Services,
  University of Saskatchewan
 ----------------------------------------------------------------------
  "Go not to the Elves for counsel, for they will say both no and yes"
Received on Sat Feb 2 07:06:59 2002