Skip to main content.
home | support | download

Back to List Archive

Re: Double Slashes When Spidering

From: Michael Tsai <lists(at)not-real.mjtsai.com>
Date: Fri Jan 24 2003 - 02:10:38 GMT
On Wednesday, January 22, 2003, at 02:32  PM, Michael Tsai wrote:

> The problem is that the spider goes into an infinite loop. After going
> through all the pages on the site, it starts printing out entries like:
>
>     Processing http://www.atpm.com//2.07/index.shtml...
>     Processing http://www.atpm.com//2.06/index.shtml...
>
> where it adds a second forward slash after the domain name. If I leave
> it running long enough, it makes another pass over the pages with three
> slashes.

I was able to stop this from happening by putting:

	return if $uri->as_string =~ m[atpm\.com//];

in test_url in spider.conf.



--Michael
Received on Fri Jan 24 02:10:59 2003