Skip to main content.
home | support | download

Back to List Archive

Ignoring <script> contents

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Jan 11 2002 - 23:35:25 GMT
Well it was only a few changes.

This only applies to HTML2 parser.  HTML2 requires building swish-e with
libxml2.

To ignore content inside of <script> or <style> (or any) html tags, use:

  IgnoreMetaTags script style

I relaxed the rules on how HTML is parsed, and it's now parsed more like
XML in that HTML markup tags are considered metanames.  But, the directive
"UndefinedMetaTags" does not apply to HTML tags.

HTML tags are whatever libxml2 says they are.  You can make up your own
fake tags, too, which will be considered metatags. 

This change allows the above IgnoreMetaTags to work.  It also allows crazy
stuff like:

   MetaNames table

then

   swish-e -w table=foo  <-- search for "foo" only in side of <table>
   swish-e -w foo        <-- won't find "foo" if it's only in <table>

or

   PropertyNames body

Is a lot like 

   StoreDescription HTML <body> 999999999

but uses "body" as the property name instead of "swishdescription".


-- 
Bill Moseley
mailto:moseley@hank.org
Received on Fri Jan 11 23:36:02 2002