Skip to main content.
home | support | download

Back to List Archive

Re: [SWISH-E:359] How to ignore <!DOCTYPE ...>?

From: Marjolein Katsma <webmaster(at)not-real.javawoman.com>
Date: Mon Jul 13 1998 - 16:35:31 GMT
Rob,

At 08:54 1998-07-13 -0700, Robert Rothenberg wrote:
>
>
>Is there any way to tell SWISH-E to ignore the <!DOCTYPE > declaration of
HTML (and 
>SGML/XML) files? It's particularly irritating to find out that the word
"PUBLIC" is too 
>common to be indexed because it occurs in the declaration.

Not with the current version at least: it detects "comments" by looking for
<! only (and is not looking for -- --); currently there is also no way to
tell the program to NOT index comments.

>
>For that matter, it'd be a good idea to have an option to ignore SGML
functions such as 
><!CDATA>, <!IGNORE>, etc. if there is not already an option to do so.
>
>Thanks,
>Rob
>

I'm working on the code (which I started doing because I wanted some added
functionality but I soon spotted some problems as well); currently I have a
reasonably stable intermediate version which solves this and a number of
other problems. If you or anyone else is interested, let me know - I have a
ZIP file with all the source (all my changes commented) and a readme
outlining changes and improvements; I can mail it or post it somewhere.

Please realize that this is NOT in anyway finished but if you want to use
or test the code, feel free. I don't give real support (or garantees!) for
this, but I certainly would appreciate comments.

BTW, ignoring <!CDATA> etc. is certainly covered in my version simply
because such tags are no longer treated as comments (you can also ignore
comments).

Meanwhile I'm working on the next stage...

Cheers,


Marjolein Katsma      webmaster@javawoman.com
Java Woman - http://javawoman.com/
Received on Mon Jul 13 09:45:07 1998