max-depth only applies to spidering. If you spider, it does not matter
whether the linked-to file is in a parent or child directory, as long
as it is on the same server domain (or a same-host or follow-links
If your start document or any other spidered document contains links
to parent directories it will index those, yes. If you only have
relative links without any "../" in them you should stay below your
I spider with maxdepth=9 which takes 7 hours on our intranet.
On 1/12/07, andy rosbrook <email@example.com> wrote:
> Hello all,
> I am curious on how the max_depth setting works in spider.pl and sub
> domains. For example if i index the url www.somesite.com/sub/ and set the
> max_depth to 2 will the spider stay within the sub folder for links or will
> it look inside somesite.com?
> I've done a few tests and it seems to go back up into root folders at
> certain times, i assume when it needs more links? Can anyone explain how it
> traverses the pages and if it is possible to limit the spider to only take
> links from the sub domain?
> MSN Hotmail is evolving – check out the new Windows Live Mail
Bookmark http://kayakfun.info/salsagids/ voor de beste salsafeestjes!
Received on Fri Jan 12 06:39:27 2007