I finished installing swish-e 2.2.1 and verified that it is working
properly by indexing and searching some sample html pages with spider.pl
via the prog method.
Now that I am ready to index my site, a question occured to me: how
spider.pl knows when to stop crawling? Does the spider only index pages
of a given server and/or domain or does the spider.pl follow all the
links that it encounters, including links to sites in other servers
and/or domains? For instance, if my site in the domain ny.frb.org has
links to pages on www.firstgov.org, does that mean that the spider.pl
will also index pages in first.gov domain? If yes, how one can limit the
spider.pl to only index pages of a certain domain and ignore all pages
of other domains?
Shen C. Yang
Information Technology Specialist
Federal Reserve Bank of New York - www.newyorkfed.org
Technology Support Division
Internal Communications and Multimedia Services
tel: (212) 720 2857
Any comments or statements made in this transmission reflect the views of the sender and are not necessarily the views of the Federal Reserve Bank of New York.
Received on Tue Oct 29 21:15:55 2002