Skip to main content.
home | support | download

Back to List Archive

Re: Behavior of max_depth in spider.pl

From: Cas Tuyn <cas.tuyn(at)not-real.gmail.com>
Date: Fri Jan 12 2007 - 14:39:26 GMT
Andy,

max-depth only applies to spidering. If you spider, it does not matter
whether the linked-to file is in a parent or child directory, as long
as it is on the same server domain (or a same-host or follow-links
host).

If your start document or any other spidered document contains links
to parent directories it will index those, yes. If you only have
relative links without any "../" in them you should stay below your
start level.

I spider with maxdepth=9 which takes 7 hours on our intranet.

Cas

On 1/12/07, andy rosbrook <andy_rosbrook@hotmail.com> wrote:
> Hello all,
>
> I am curious on how the max_depth setting works in spider.pl and sub
> domains. For example if i index the url www.somesite.com/sub/ and set the
> max_depth to 2 will the spider stay within the sub folder for links or will
> it look inside somesite.com?
>
> I've done a few tests and it seems to go back up into root folders at
> certain times, i assume when it needs more links? Can anyone explain how it
> traverses the pages and if it is possible to limit the spider to only take
> links from the sub domain?
>
> thanks
> andy
>
> _________________________________________________________________
> MSN Hotmail is evolving  check out the new Windows Live Mail
> http://ideas.live.com
>
>


-- 
Bookmark  http://kayakfun.info/salsagids/  voor de beste salsafeestjes!
Received on Fri Jan 12 06:39:27 2007