At 01:11 PM 10/29/02 -0800, Shen Yang wrote:
>Now that I am ready to index my site, a question occured to me: how
>spider.pl knows when to stop crawling? Does the spider only index pages
>of a given server and/or domain or does the spider.pl follow all the
>links that it encounters, including links to sites in other servers
It spiders one server, which is defined by a host name and a port number.
>For instance, if my site in the domain ny.frb.org has
>links to pages on www.firstgov.org, does that mean that the spider.pl
>will also index pages in first.gov domain?
The configuration file is a Perl array, with each element of the array
being a separate server config (represented by a perl hash. This allows
you to index multiple servers. See:
For a given server, you can use the "same_hosts" setting to say that
www.frb.org and frb.org are the same servers.
There's currently no way to say index www.frb.org but follow links to a
list of other servers from www.frb.org.
Received on Tue Oct 29 21:48:01 2002