Hiding <script> and CDATA was(swish-e-2.1-dev-25-2002-01-09)

From: Bill Moseley <moseley(at)>
Date: Fri Jan 11 2002 - 17:23:32 GMT
Ok, I need a little help.

First, keep in mind there's two HTML parsers possible in swish.  The
original HTML parser is broken in a number of ways (but I think in this
case you can see that working in your favor ;).

Swish-e can also use the HTML2 parser (if compiled with libxml2).  This is
a much more accurate parser.

To hide <script> content from old browsers you use this trick

[ ]

<SCRIPT type="text/javascript">
<!--  to hide script contents from old browsers
  function square(i) {
    document.write("The call passed ", i ," to the function.","<BR>")
    return i * i
  document.write("The function returned ",square(5),".")
// end hiding contents from old browsers  -->

Now, that actually works with swish-e's HTML parser because it sees the
comment and ignores that text.

But with libxml2 (HTML2), everything inside the <script> is *correctly*
seen as CDATA, hence it's not really a HTML comment.

Now, here's where I need help.  I'm sorry I can't remember, but someone on
the list in September had a situation where CDATA was not being indexed.
So I a handler to treat CDATA as normal text as far as swish is concerned.

So that means anything inside of <script>, even the "comments" will be
indexed (and added to any properties, such as "swishdescription").

So, my question is: How to deal with CDATA?  Are there times when you want
to index CDATA and other times when you don't?

Bill Moseley
