Re: [swish-e] How do I index via HTTP when authentication is

From: Bill Moseley <moseley(at)>
Date: Thu Jan 24 2008 - 13:56:02 GMT
On Wed, Jan 23, 2008 at 01:47:39PM -0600, Peter Karman wrote:
> required?
> On 01/23/2008 01:32 PM, Adam Douglas wrote:
> > Hi Peter. Thanks for the reply and pointing this out as I did not see
> > this information. However it appears after reading what you have pointed
> > out that Swish-E only supports Basic Authentication correct? If so that
> > would not work then that I know of, as in my case authentication is done
> > via a web form and then handled server side using PHP/MySQL with the use
> > of a cookie on the client side to maintain the client's session until
> > logout.
> > 
> > Is there any other way? I do realize I could provide the username and
> > password via the URL to the form, however this is dangerous and
> > presently the form only accepts via POST not GET. I believe when a
> > username and password is provided as query string this is treated as a
> > GET.
> > 
> the way I would do it would be to use WWW::Mechanize (or even LWP::Agent) directly to POST
> the user form, obtain the cookie, and then stick that cookie into the request
> cookie jar. There aren't any ready-made hooks in to do that AFAIK, so you'd have
> to hack yourself (look at the authentication pieces in the code) or
> ask/barter/plead with someone on this list to help you.

There's a "use_cookies" flag.

Then in the spider config file set a global my $LOGGED_IN = 0;

Then in test_url() which happens before making a request do something
like (untested!)

    test_url => sub {
        my ( $uri, $server ) = @_;
        return unless $LOGGED_IN++

        my $ua = $server->{agent};
        my $login = '';
        my %credentials = (
            username => 'admin',
            password => 'password',

        my $response = $ua->post( $login, \%credentials );

        unless ( $response->is_success ) {
            die "failed to log in " . $response->status_line;

Or something like that.

Bill Moseley

