At 10:51 PM 06/29/02 -0700, Sutherland, Paul wrote:
>I want to only index certain directories on a server.
>e.g. http://something.com/foo & http://somthing.com/bar but not the rest of
This is reasonably easy with 2.1-dev and the spider.pl file if you know a
When using the spider.pl program you can define a call-back function that
gets called for every URL extracted from the page. If the call-back
function returns false then that URL will not be added to the list of URLs
>From the top level directory of the swish-e distribution:
and search for "test_url".
With the old -S http method you might be able to hack up th swishspider
perl program (located in the src directory) to filter out any links that
you don't want to spider. That perl program is the one that fetches the
remote doc and extracts the links from the doc.
Received on Sun Jun 30 06:04:04 2002