Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Having trouble trying to ignore invalid tags in HTMLdocs

From: Kathleen Vignos <kathleen(at)>
Date: Fri Mar 02 2007 - 22:57:28 GMT
Hi Bill,

Thank you so much.  That did the trick.  I had previously tried something
like ParserWarningLevel and when it gave an error, I thought I had the wrong
config file or the wrong config setting!  Thanks so much for the help.
Everything seems to be working great now, and my 100,000 files indexed
fairly well with the -e turned on.

Best Regards, 

-----Original Message-----
[] On Behalf Of Bill Moseley
Sent: Thursday, March 01, 2007 4:06 PM
To: Swish-e Users Discussion List
Subject: Re: [swish-e] Having trouble trying to ignore invalid tags in

On Thu, Mar 01, 2007 at 02:58:15PM -0800, Kathleen Vignos wrote:
> I've tried the following in the config file (swish.conf), with IgnoreWords
> by itself, then IgnoreMetaTags by itself, then added Undefined MetaTags.
> get the exact same results/errors each time.  I also tried commenting out
> "DefaultContents HTML*" and also got the same results/errors (shown at the
> bottom of this message).
> # Tell swish-e what to index
> IndexDir /usr/local/apache/htdocs/documents/
> # Only index HTML files
> IndexOnly .htm .html
> # Use the HTML parser
> DefaultContents HTML*
> # Ignore words list
> IgnoreWords /usr/local/apache/swish-e-2.4.5/ignorewords.txt
> # Ignore certain tags
> UndefinedMetaTags ignore
> I continue to get the following error messages:
> /usr/local/apache/htdocs/documents/doc.htm:1: error: Tag document invalid
>          ^

Ya, that's an error from libxml2.  IgnoreMetaTags doesn't disable
libxml2's warnings -- for that you need to change ParserWarnLevel.

    ParserWarnLevel 1

Bill Moseley

Unsubscribe from or help with the swish-e list:

Help with Swish-e:

Users mailing list

Users mailing list
Received on Fri Mar 2 17:54:03 2007