Skip to main content.
home | support | download

Back to List Archive

spider.pl connection bug (and fix)

From: Trond Nilsen <t.nilsen(at)not-real.alchemy.co.nz>
Date: Mon Oct 07 2002 - 07:18:03 GMT
Hey all.

In the spider.pl script that comes with Swish-E, it generates a user agent 
object (LWP::UserAgent or LWP::RobotUA) for each server hash in the array 
generated by the config file. That object is stored in the server hash, but is 
not removed until the whole array of servers is processed. As a result, the 
connection to each server does not close automatically.

Thus, when running the spider over a large number of sites, a backlog of 
unclosed connections builds up, which eventually prevents new connections from 
being opened (at least, in the case of Win32).

To fix this, I've just removed the user agent from the server hash once 
spidering of each server is complete, letting it close as control falls off 
the end of that block of code.

That is, I've added a new line

     $server->{ua} = undef;

at line 263 of 'spider.pl'.

I'm not a great or even experienced Perl coder (taught myself about a month 
ago), so there may be hidden reasons why this is a bad idea.

Either way, it works for me, and stops Swish-E (well, 'spider.pl') from dying 
(after about 130 sites).

FYI, I'm working on Win2k, using Swish-E 2.2.1 (spider.pl v1.43, apparently).

I hope this is useful.. If anyone can see why this is a bad idea, please tell 
me.. :)

-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Trond Nilsen                                                   Alchemy Group
Software Engineer                                   http://www.alchemy.co.nz
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Received on Mon Oct 7 07:30:20 2002