Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] How do I index via HTTP when authentication is

From: Adam Douglas <ADouglas(at)not-real.venmarces.com>
Date: Tue Feb 05 2008 - 21:29:01 GMT
Well now this sounds like the solution I'm looking for Peter and Bill. 

>> the way I would do it would be to use WWW::Mechanize (or even 
>> LWP::Agent) directly to POST the user form, obtain the cookie, and 
>> then stick that cookie into the spider.pl request cookie jar. There 
>> aren't any ready-made hooks in spider.pl to do that AFAIK, so you'd 
>> have to hack spider.pl yourself (look at the authentication pieces in
the code) or ask/barter/plead with someone on this list >>to help you.

Which one should I use WWW:Mechanize or LWP::Agent? It appears
WWW:Mechanize is what I need but not sure which between the two is best
to go with. Comments?

>There's a "use_cookies" flag.

How do you use ths flag. I see it in the spider.pl and a little bit of
comments on it. Do I set the use_cookies to true or what value do I give
it to use it? Also is this done in the config file or in the spider.pl?

Its interesting Bill how you state in comments of the spider.pl that
"Some (poorly written ;) sites require cookies to be enabled on
clients". I by no means call my self an expert to know it all, but I
can't see how the session would be kept without the use of a cookie. IP
address alone is not enough to be used so what else unique value can you
link the session to the client's browser agent?

>Then in the spider config file set a global my $LOGGED_IN = 0;

Okay that is easy enough to do.

Now where the heck to put this below, I took a guess and placed it on
line number 144 of spider.pl. Also is this code to be used with
WWW:Mechanise or LWP::Agent? If either one is used do I have to add use
module; statement at the top of the spider.pl I would assume to load it
in. I really surprised no one has come into this situation or made this
type of code available. I'll be more the willing to post the solution
once it's found to work on here and on my blog. The next step is to
figure out where to call the test_url this I'm not entirely sure where
but I could make a few guesses and see what results I get. At this point
I just need to know if all I do is load whichever the best module it is
LWP::Agent or WWW:Mechanise and in theory this should work fine.


Any additional comments to push in closer to understanding this all,
would be appreciated.

>Then in test_url() which happens before making a request do something
like (untested!)
>
>    test_url => sub {
>        my ( $uri, $server ) = @_;
>        return unless $LOGGED_IN++
>
>        my $ua = $server->{agent};
>        my $login = 'http://example.com/login';
>        my %credentials = (
>            username => 'admin',
>            password => 'password',
>        );
>
>        my $response = $ua->post( $login, \%credentials );
>
>        unless ( $response->is_success ) {
>            $server->{abort}++;
>            die "failed to log in " . $response->status_line;
>        }
>    }

This message (including any attachments) is intended only for
the use of the individual or entity to which it is addressed and
may contain information that is non-public, proprietary,
privileged, confidential, and exempt from disclosure under
applicable law or may constitute as attorney work product.
If you are not the intended recipient, you are hereby notified
that any use, dissemination, distribution, or copying of this
communication is strictly prohibited. If you have received this
communication in error, notify us immediately by telephone and
(i) destroy this message if a facsimile or (ii) delete this message
immediately if this is an electronic communication. 

Thank you.
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Feb 5 16:29:02 2008