At 09:16 AM 03/25/02 -0800, Una Cullen wrote:
>
>Is there a way to exclude the swish-e parser from indexing a file with a
>specified piece of text residing in it?
Not directly, but like anything in swish, there's a way:
There's a few options:
When using HTML2 it can use either robots.txt or the robots <meta> tag
exclusion. Or, swish can skip html documents that match a given title.
If you want to ignore files based on some text you could write a very
simple filter that uses grep and cat. Grep for the text, and if not found,
cat the file back to swish. Otherwise, return a title that tells swish to
skip the text.
In the config do something like
FileFilterMatch .\f.sh %p /./
FileRules title is skip
Then use something like:
> cat f.sh
#!/bin/sh
if grep foo "$1" >/dev/null
then
echo "<title>skip</title>"
else
cat "$1"
fi
Myself, I would use -S prog, read in the file, look for the text, and then
only pass the files I want indexed on to swish because that avoids forking
swish and running a shell script for every document, but that's more work.
Does that help?
--
Bill Moseley
mailto:moseley@hank.org
Received on Mon Mar 25 18:41:35 2002