Ok, I need a little help.
First, keep in mind there's two HTML parsers possible in swish. The
original HTML parser is broken in a number of ways (but I think in this
case you can see that working in your favor ;).
Swish-e can also use the HTML2 parser (if compiled with libxml2). This is
a much more accurate parser.
To hide <script> content from old browsers you use this trick
[ http://www.w3.org/TR/html4/interact/scripts.html#h-18.2.1 ]
<SCRIPT type="text/javascript">
<!-- to hide script contents from old browsers
function square(i) {
document.write("The call passed ", i ," to the function.","<BR>")
return i * i
}
document.write("The function returned ",square(5),".")
// end hiding contents from old browsers -->
</SCRIPT>
Now, that actually works with swish-e's HTML parser because it sees the
comment and ignores that text.
But with libxml2 (HTML2), everything inside the <script> is *correctly*
seen as CDATA, hence it's not really a HTML comment.
Now, here's where I need help. I'm sorry I can't remember, but someone on
the list in September had a situation where CDATA was not being indexed.
So I a handler to treat CDATA as normal text as far as swish is concerned.
So that means anything inside of <script>, even the "comments" will be
indexed (and added to any properties, such as "swishdescription").
So, my question is: How to deal with CDATA? Are there times when you want
to index CDATA and other times when you don't?
--
Bill Moseley
mailto:moseley@hank.org
Received on Fri Jan 11 17:24:18 2002