Skip to main content.
home | support | download

Back to List Archive

Spider taking too long to index?

From: David VanHook <dvanhook(at)>
Date: Tue Oct 08 2002 - 13:50:29 GMT
Hello -- at the good suggestions of Bill and others, I've decided to make a
go of it with  It works -- but it seems to be taking
significantly longer than the times other people on this list have reported.

Last night, during a time when the site was not very busy at all, it took 3 hours and 16 minutes to index 19,277 files (a rate of 1.6 per
second, according to the SWISH report).  The total amount of CPU utilization
time was 22 minutes, 20 seconds.

The way I'm doing it is, I feed a single page which contains a
list of links to all the pages I want it to index.  That page is huge, of
course.  Then I tell to only go one level deep.  So it grabs the
first item on the page, indexes it, returns to the list, grabs the next
item,  indexes it, returns to the list, etc.  Is that not the way this
should work?  Should I modify some setting on to
account for this system?

Here's the important config parts from the

        # limit to only .html files
        # test_url    => sub { $_[0]->path =~ /\.html?$/ },

        delay_min   => .0001,     # Delay in minutes between requests
        max_depth   => 1,
        max_time    => 300,        # Max time to spider in minutes
        max_files   => 30000,       # Max Unique URLs to spider
        max_indexed => 30000,        # Max number of files to send to swish
for indexing
        keep_alive  => 1,         # enable keep alives requests

Because I'm generating this list of items to index myself, I turned off the
test_url function.  But that didn't seem to help performance all that much.

I'm running this on a Netra T-1 with 256 megs of RAM and one 300 MHZ Sparc
processor.  So it's a decent machine, but nothing huge.  Is that the
problem?  Any other suggestions?  Our site is real fast, so it's not the
site's performance overall, I sure don't think.

Thanks --

Dave V.

David VanHook
Director of Technology
Wine Spectator Online
Received on Tue Oct 8 13:54:19 2002