On Tue, Jul 05, 2005 at 03:19:01PM -0400, Aliasgar Dahodwala wrote:
> what i failed to find out is, why does the spider sleep, something
> around 5000 x delay_sec after fetching somewhere around 5824 files.
> (the exact count value is 5824). In the debug file i have that many
> "sleeping 5 seconds" messages, before the spider starts fetching again.
>
> so i am thinking there is a bug in there somehwhere.
Sounds like it. What's magic about 5824, I wonder. In my version of
spider.pl delay_request() is called inside the spider() function.
It's not the best place to call delay_request() because it's not
really making the request at that point (test_url could skip the
request, for example). But, that's why the wait time is calculated
based on the last time a request really was completed.
Having a bunch of "sleeping 5 seconds" in there without any other
requests happening doesn't make sense.
Can you generate a simple test case? This is what I did:
test.cgi:
#!/usr/bin/speedy
use strict;
use warnings;
my $count = ( $ENV{QUERY_STRING} || '') =~ /count=(\d+)/ ? $1 + 1 : 1;
if ( $count > 6000 ) {
print <<EOF;
content-type: text/html
status: 404 Not Found
<html><body>Not found</body></html>
EOF
exit;
}
print <<EOF;
Content-Type: text/html
<html>
<head><title>This is doc $count</title></head>
<body>
<a href="test.cgi?count=$count">Rec$count</a>
</body>
</html>
EOF
httpd.conf
Include /etc/apache/modules.conf
ErrorLog error_log
PidFile pid_file
ServerName localhost
TypesConfig /dev/null
Listen 4321
DocumentRoot /home/moseley/apache
<files test.cgi>
Options +ExecCGI
SetHandler cgi-script
</files>
spider.conf:
moseley@bumby:~/apache$ cat spider.conf
@servers = (
{
base_url => 'http://localhost:4321/test.cgi',
delay_sec => 5,
keep_alive => 1,
email => 'moseley@localhost',
}
);
Start apache:
moseley@bumby:~/apache$ /usr/sbin/apache -d `pwd` -f httpd.conf
Run the spider: (modified to print sleeping without debug enabled):
moseley@bumby:~/apache$ ./spider.pl spider.conf >/dev/null
./spider.pl: Reading parameters from 'spider.conf'
sleeping 5 seconds
sleeping 5 seconds
[...]
Summary for: http://localhost:4321/test.cgi
Connection: Close: 60 (0.1/sec)
Connection: Keep-Alive: 5,941 (14.5/sec)
Total Bytes: 698,679 (1704.1/sec)
Total Docs: 6,000 (14.6/sec)
Unique URLs: 6,001 (14.6/sec)
So it fetched 6000 docs and the sleeping messages went as expected.
Is there a way you can demonstrate what you are seeing so I can repeat
it?
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Tue Jul 5 13:59:01 2005