Re: Excluding Files

From: Bill Moseley <moseley(at)>
Date: Mon Mar 25 2002 - 18:37:23 GMT
At 09:16 AM 03/25/02 -0800, Una Cullen wrote:
>Is there a way to exclude the swish-e parser from indexing a file with a
>specified piece of text residing in it?

Not directly, but like anything in swish, there's a way:

There's a few options:

When using HTML2 it can use either robots.txt or the robots <meta> tag
exclusion.  Or, swish can skip html documents that match a given title.  

If you want to ignore files based on some text you could write a very
simple filter that uses grep and cat.  Grep for the text, and if not found,
cat the file back to swish.  Otherwise, return a title that tells swish to
skip the text.

In the config do something like

  FileFilterMatch .\ %p /./
  FileRules title is skip

Then use something like:

> cat

if grep foo "$1" >/dev/null 
   echo "<title>skip</title>"
   cat "$1"

Myself, I would  use -S prog, read in the file, look for the text, and then
only pass the files I want indexed on to swish because that avoids forking
swish and running a shell script for every document, but that's more work.

Does that help?

Bill Moseley
