Skip to main content.
home | support | download

Back to List Archive

Re: Defaults for -S http method

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Apr 04 2003 - 06:35:32 GMT
On Thu, 3 Apr 2003, Greg Fenton wrote:

> 
> --- Bill Moseley <moseley@hank.org> wrote:
> > 
> > See any problems with those changes?
> > 
> 
> 
> Well, its an argument either way.  The problem with the values you
> propose is that someone who doesn't know what they are doing could
> cause web-havoc.

I suppose.  I'm saying to change the Delay to five seconds between
requests.  That's 12 requests per minute.  I just posted a simple
mod_perl/swish-e script the other day that was running a query (read:
dynamic site)  that was fetching something close to 150 requests per
*second*.  http://www.chamas.com/bench/ has examples of 1000+ requests per
second (although for simple pages).  For static sites I doubt 12/sec is an
issue.  Larger pages will be slower, but if 12/minute it is enough to
cause havoc then I suspect there's other problems that need attention.

Delay also counted time from the *start* of one request to start of the
next.  The docs say for "Delay":

  The number of seconds to wait between issuing requests to a server

which was not really correct.  If it was set for a minute and a document
took a minute to download then there was no "delay" before the start of
the next request.  Now it makes a request, waits "delay" seconds" then
makes the next request.

So at today's standards, one per minute seems a bit much.

Note that if using -S prog with spider.pl and Keep Alives, it could
possibly be more load on the server to fetch one per minute than one
every five seconds, because of Keep Alive timeouts.  The default
KeepAliveTimeout for Apache is 15 seconds, IIRC.  Ok, that's probably
wrong, but if you were fetching one every 16 seconds then it would likely
be true.

As for the MaxDepth, I think I've seen one or two people that have posted
wondering why their entire site was not indexed --- which turned out to be
the MaxDepth setting.  Nobody has complained or expected that Swish would
only index the top one or two levels only.

As it is currently, you can have a six page site and not have it all
indexed. So, although it's nice to have that configuration option, it's
hard to have a default.  It's just hard to imagine that someone actually
depends on that default for indexing just five levels deep into their
site.

Seem reasonable, or have I finally lost my mind?


-- 
Bill Moseley moseley@hank.org
Received on Fri Apr 4 06:43:19 2003