Skip to main content.
home | support | download

Back to List Archive

Re: Recursive Spidering in Windows

From: <moseley(at)not-real.hank.org>
Date: Mon Oct 27 2003 - 16:22:52 GMT
On Mon, Oct 27, 2003 at 07:28:21AM -0800, narayananps@hp.com wrote:
> But I am not able to get the spider recurse thru all the links in
> index.html.
> I see from the perl doc that the default html tags for links is <a> So I
> dont specify it in my conf.
> Still I am not able to do a recursive spidering.

Enable debugging.  You can enable the debug feature to show you the 
links extracted from docs and another option to tell you what links are 
skipped and why.

> Also, is there a utility to configure the spider via a proxy on windows ( i
> want to spider an external site from inside a firewall) ? 

Run perldoc LWP::UserAgent.  It describes how to use a proxy with LWP 
(which is what spider.pl uses).  I don't think there's an easy way to 
enable the proxy  from within the config file, but it should be easy to 
see where to call $ua->proxy or $ua->env_proxy;


-- 
Bill Moseley
moseley@hank.org
Received on Mon Oct 27 16:35:20 2003