At 10:38 PM 02/01/02 -0800, Darryl Friesen wrote:
>I looked through the archives and docs a bit (granted, not very thoroughly)
>and didn't find too much on spidering sites protected by Basic
>authentication, so I made this 2 line change to the spider.pl program that
>came with SWISH-E 2.1-dev-25. (sorry about the long lines here)
I just updated spider.pl. The way I think it works is:
credentials => 'user:pass',
will use those credentials for all URLs. Using
base_url => http://user:email@example.com/index.html
is basically the same thing as using the "credentials" option. If an
extracted link has a username:password in the ULR, it will use that over
what's defined in the config file.
Otherwise, if a request returns 401 you will be prompted for a username and
password for that realm.
If another document returns 401, it will then check if a user:password has
already been used for that Realm, and if so will first try that, and if
that fails, will prompt again. Not great since if one directory requires a
new user:pass then every file requested in that directory will get
requested twice, first time will return 401 and second will try the cached
username and password. Oh well.
If you have a chance, give it a try and let me know how it works.
Received on Thu Feb 7 01:57:06 2002