Re: [SWISH-E:359] How to ignore <!DOCTYPE ...>?

From: Marjolein Katsma <webmaster(at)>
Date: Mon Jul 13 1998 - 16:35:31 GMT

At 08:54 1998-07-13 -0700, Robert Rothenberg wrote:
>Is there any way to tell SWISH-E to ignore the <!DOCTYPE > declaration of
HTML (and 
>SGML/XML) files? It's particularly irritating to find out that the word
"PUBLIC" is too 
>common to be indexed because it occurs in the declaration.

Not with the current version at least: it detects "comments" by looking for
<! only (and is not looking for -- --); currently there is also no way to
tell the program to NOT index comments.

>For that matter, it'd be a good idea to have an option to ignore SGML
functions such as 
><!CDATA>, <!IGNORE>, etc. if there is not already an option to do so.

I'm working on the code (which I started doing because I wanted some added
functionality but I soon spotted some problems as well); currently I have a
reasonably stable intermediate version which solves this and a number of
other problems. If you or anyone else is interested, let me know - I have a
ZIP file with all the source (all my changes commented) and a readme
outlining changes and improvements; I can mail it or post it somewhere.

Please realize that this is NOT in anyway finished but if you want to use
or test the code, feel free. I don't give real support (or garantees!) for
this, but I certainly would appreciate comments.

BTW, ignoring <!CDATA> etc. is certainly covered in my version simply
because such tags are no longer treated as comments (you can also ignore

Meanwhile I'm working on the next stage...


Marjolein Katsma
Java Woman -
