Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] How do I index via HTTP when authentication is

From: William M Conlon <bill(at)not-real.tothept.com>
Date: Tue Feb 05 2008 - 22:14:11 GMT
Hi Adam,

I'm not sure why it's any more dangerous to require/allow the swish-e  
spider to login to an application than any other user agent that  
presents credentials.  In fact for a public facing application, far  
more checks can be applied (username/password;IP_address;one-of-a- 
kind user agent) to the spider than is feasible with a normal user's  
login.

Merely enabling cookies by itself presents just as much risk of forgery.

Anyway, here's a snip from my @servers:

@servers = (
         {
         base_url    => 'http://my.domain.com/login.app? 
_function=checkpw&userid=swishe&password=swishe&remember=no',
         use_cookies => 1,
#        debug => DEBUG_URL | DEBUG_SKIPPED | DEBUG_FAILED |  
DEBUG_HEADERS,
         delay_sec => 1,
         test_url    => sub {
                 my  $ok =  !($_[0]->path =~ /login.app/ && $_[0]- 
 >query =~ /_function=logout/);
                 return 1 if $ok;
                 return; },
...

Essentially, the spider logs in as the user 'swishe' so it sees the  
same content as any similarly privileged user. remember=no means  
don't give swish-e a long-term cookie to re-authenticate with.
use_cookies allows the application to provide, and swish-e to use the  
session cookies needed for access
test_url keeps the spider from following a link to log out, to assure  
we follow all links.

Bill


On Feb 5, 2008, at 1:05 PM, Adam Douglas wrote:

> Hi William. Well that would be a workable solution, however not one  
> that
> should be used in my opinion. Its to dangerous and should not be
> necessary. Thanks for the reply and suggestion.
>
> Best,
> Adam
>
>> Date: Wed, 23 Jan 2008 13:14:20 -0800
>> From: William Conlon <bill@tothept.com>
>> Subject: Re: [swish-e] How do I index via HTTP when authentication is
> required?
>> To: Swish-e Users Discussion List <users@lists.swish-e.org>
>>
>> I wrote a backdoor in my login application that allows specified IP
> addresses to login via GET, in order to have a simple way
>> for swish-e to access protected content.
>>
>> Then just create a username/password combination for swish-e to login
> with.
>
> This message (including any attachments) is intended only for
> the use of the individual or entity to which it is addressed and
> may contain information that is non-public, proprietary,
> privileged, confidential, and exempt from disclosure under
> applicable law or may constitute as attorney work product.
> If you are not the intended recipient, you are hereby notified
> that any use, dissemination, distribution, or copying of this
> communication is strictly prohibited. If you have received this
> communication in error, notify us immediately by telephone and
> (i) destroy this message if a facsimile or (ii) delete this message
> immediately if this is an electronic communication.
>
> Thank you.
> _______________________________________________
> Users mailing list
> Users@lists.swish-e.org
> http://lists.swish-e.org/listinfo/users

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Feb 5 17:14:14 2008