Troy Wical wrote on 06/16/2010 12:07 AM:
>
> When I run "./spider.pl spider.config > output.txt" I get the following:
>
> ###########################
> Use of uninitialized value in sprintf at /usr/lib/swish-e/spider.pl
line 38.
> Use of uninitialized value in sprintf at /usr/lib/swish-e/spider.pl
line 38.
those uninit value warnings you can ignore. they are fixed in svn.
> /usr/lib/swish-e/spider.pl: Reading parameters from 'spider.config'
> Warning: document 'http://restricted-website.com' has no content
>
> Summary for: http://restricted-website.com
> Connection: Close: 1 (1.0/sec)
> Total Bytes: 1 (1.0/sec)
> Total Docs: 1 (1.0/sec)
> Unique URLs: 1 (1.0/sec)
> ###########################
>
> Now, there are two things that I have noticed. When I login to this
> website via browser, the url end in dashboard.action, as opposed to
> something more common like .php etc. Also, the pop up window to login
> is being handled by a second url that takes care of all the
> authentication. I'm wondering if this isn't throwing a curve ball to
> swish-e when it comes to logging in.
>
I'm sure it is. The spider.pl just uses the HTTP basic authentication
mechanism.
try turning on debugging to confirm:
http://swish-e.org/docs/spider.html#debug
You probably need to hack spider.pl or use the get_password callback to
do the authentication piece before the spider actually does its work. If
that 2nd window sets a cookie, you could do a POST to that login url
with your credentials, get the returned cookie and set it in the
spider.pl user agent for the rest of the site.
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed Jun 16 09:31:35 2010