Skip to main content.
home | support | download

Back to List Archive

Re: Limiting indexing

From: Bill Moseley <moseley(at)>
Date: Sun Jun 30 2002 - 06:00:36 GMT
At 10:51 PM 06/29/02 -0700, Sutherland, Paul wrote:
>I want to only index certain directories on a server.
>e.g. & but not the rest of
>the site

This is reasonably easy with 2.1-dev and the file if you know a
little perl.

When using the program you can define a call-back function that
gets called for every URL extracted from the page.  If the call-back
function returns false then that URL will not be added to the list of URLs
to spider.

>From the top level directory of the swish-e distribution:

   perldoc prog-bin/

and search for "test_url".

With the old -S http method you might be able to hack up th swishspider
perl program (located in the src directory) to filter out any links that
you don't want to spider.  That perl program is the one that fetches the
remote doc and extracts the links from the doc.

Bill Moseley
Received on Sun Jun 30 06:04:04 2002