My MaxKeepAliveRequests is set to the default of 100 in apache.
I agree with you, from what i figured from the spider.pl code the spider
should put in a delay of delay_sec between two connection requests. I
logged the debug messages and saw the "sleeping 5 seconds" message after
approximately every 100 requests. which is fine.
what i failed to find out is, why does the spider sleep, something
around 5000 x delay_sec after fetching somewhere around 5824 files.
(the exact count value is 5824). In the debug file i have that many
"sleeping 5 seconds" messages, before the spider starts fetching again.
so i am thinking there is a bug in there somehwhere.
Regards
Aliasgar.
Bill Moseley wrote:
>On Tue, Jul 05, 2005 at 09:13:08AM -0700, Aliasgar Dahodwala wrote:
>
>
>>I am running swish-e 2.4.3 on a redhat linux box. I am using the
>>included spider.pl script to spider my website.
>>
>>My problem: When i enable the keep_alive directive of the spider program
>>and set the delay_sec to 5, the spider fetches the pages at blazing
>>speed ignoring the delay_sec directive, and after going through around
>>5000 pages it then catches up on all the delay, it stops fetching any
>>more pages and just keeps sleeping for 5 seconds each. After a long wait
>>it continues from where it left off.
>>
>>
>
>Sounds like a bug. By design it ignores the delay_sec setting in a
>keep alive connection. The point of the keep alive is to allow faster
>requests -- avoiding the time required to start up the new connection.
>
>From the docs:
>
># delay_sec
>
> This optional key sets the delay in seconds to wait between
> requests. See the LWP::RobotUA man page for more information. The
> default is 5 seconds. Set to zero for no delay.
>
> When using the keep_alive feature (recommended) the delay will be
> used only where the previous request returned a "Connection:
> closed" header.
>
>
>So after fetching 5000 docs (is that your MaxKeepAliveRequests set to
>5000?) you are saying that the spider delays delay_sec seconds x 5000
>before it fetches any more documents?
>
>Let's see, the wait time is set here:
>
> my $wait = $server->{delay_sec} - ( time - $server->{last_response_time} );
> return unless $wait > 0;
> sleep( $wait );
>
>That last_response_time is the time the last request was completed,
>which should normally be almost the same as the current time, so you
>end up with delay_sec. So I don't see how it could be delaying more
>than delay_sec.
>
>Is that what you mean?
>
>
>
*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Tue Jul 5 12:20:30 2005