Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] How do I index via HTTP when authentication is

From: William M Conlon <bill(at)not-real.tothept.com>
Date: Thu Feb 21 2008 - 16:35:34 GMT
Hi Adam,



On Feb 21, 2008, at 8:10 AM, Adam Douglas wrote:

> Hi Bill.
>
>> 1.  You need to make sure your session (cookie)  is
>> maintained as you traverse from the cleartext to the
>> encrypted domains.
>
> Well that obviously makes sense however I don't know how to track  
> this.
> I check who is logged into the web site and the swishe user never
> appears as an authenticated client but appears as a non-authenticated
> client (I see this by IP address).

When redirecting to another domain you need to provide a means for  
the session to be continued. For example, a unique identifier (i.e,  
the session cookie) could be appended to the query string.  The  
server to which the user is redirected uses the uid to re-establish  
session cookies.  Of course your application server must allow you to  
use the same session identifier with different domains.

>
>> 2.  Does this response provide a <body> with links for the
>> spider to follow?
>
> Not exactly sure what you mean here. The login page after successfully
> authenticated redirects to the homepage and yes there is links to  
> spider
> from that point. In my web server access logs I only see two log  
> entries
> when I initiate the indexing. So for some reason Swishe is dyeing at
> /login/.

Is spider.pl configured to know that the server to which you are  
redirecting is the 'same' as the original.  If not, the spider will  
interpret the redirected page as an 'off-site link' and halt.
>
> 10.10.10.4 - - [21/Feb/2008:10:07:07 -0600] "GET /robots.txt HTTP/1.1"
> 200 253 "-" "swish-e spider http://swish-e.org/"
> 10.10.10.4 - - [21/Feb/2008:10:07:09 -0600] "GET
> /login/?szID=username&szPWD=password HTTP/1.1" 302 6226 "-" "swish-e
> spider http://swish-e.org/"
>
> Honestly I am at a loss now as I'm not sure what to try to resolve  
> this
> issue let alone track down more information to see what the problem  
> is.
> I'm rather confused as to why Swishe is not getting past the login  
> page.
> Is there some way I can see everything its doing?

You have all the info with your spider DEBUG and the apache access  
and error_logs.  I think the problem is arising from the redirect to  
a different site.  I should have seen this yesterday:

Summary for:
http://blowfish.venmarces.com/login/?szID=username&szPWD=password
Connection: Close: 1  (0.3/sec)
    Off-site links: 1  (0.3/sec)
       Unique URLs: 1  (0.3/sec)

>
> Best,
> Adam
>
> This message (including any attachments) is intended only for the  
> use of the individual or entity to which it is addressed and may  
> contain information that is non-public, proprietary,privileged,  
> confidential, and exempt from disclosure under applicable law or  
> may constitute as attorney work product. If you are not the  
> intended recipient, you are hereby notified that any use,  
> dissemination, distribution, or copying of this communication is  
> strictly prohibited. If you have received this communication in  
> error, notify us immediately by telephone and
> (i) destroy this message if a facsimile or (ii) delete this message
> immediately if this is an electronic communication. Thank you.

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Feb 21 11:35:41 2008