I looked through the archives and docs a bit (granted, not very thoroughly)
and didn't find too much on spidering sites protected by Basic
authentication, so I made this 2 line change to the spider.pl program that
came with SWISH-E 2.1-dev-25. (sorry about the long lines here)
[moondog] diff spider.orig.pl spider.pl
268a269
> $request->authorization_basic( split(':', $server->{credentials}, 2) )
if ($server->{credentials});
661a663
> $request->authorization_basic( split(':', $server->{credentials},
2) ) if ($server->{credentials});
In order for this to work you need to be using swish-e something like this
./swish-e -c swish.config -S prog
with a swish.conf that looks a bit like:
IndexDir ./spider.pl
SwishProgParameters ./spider.config
and you need to add an extra config option to your spider.config file like
this:
credentials => 'username:password'
In case your wondering, yes, I made that up. You won't find it in the docs
or FAQ. And yes, it's a bit insecure, but with the right file permissions
on your config files, it shouldn't be the end of the world.
I offer this up to whomever wants it. It seems to work fine here for me,
but I haven't tested it thoroughly. There are likely other, and possibly
more secure, ways to accomplish this (and I wouldn't be opposed to hearing
them), but hey, it's 12:30am here so it's the best I can do right now. :)
- Darryl
----------------------------------------------------------------------
Darryl Friesen, B.Sc., Programmer/Analyst Darryl.Friesen@usask.ca
Education & Research Technology Services, http://gollum.usask.ca/
Department of Computing Services,
University of Saskatchewan
----------------------------------------------------------------------
"Go not to the Elves for counsel, for they will say both no and yes"
Received on Sat Feb 2 07:06:59 2002