Skip to main content.
home | support | download

Back to List Archive

Re: difference in XML2 vs HTML2 ?

From: Peter Karman <karman(at)not-real.cray.com>
Date: Tue Feb 03 2004 - 17:12:16 GMT
Bill Moseley wrote:
> On Mon, Feb 02, 2004 at 10:02:38PM -0800, Peter Karman wrote:
> 
> 
>>The difference seems to be that the XML2 version splits words on tags,
>>while the HTML2 parser does not.
> 
> 
> That might be true in some cases.  It's been discussed on the list 
> before how to deal with 
> 
>      <tag>text</tag><tag>other</tag>
> 
> is that one or two words?
> 
> 

ah. So this:

http://swish-e.org/Discussion/archive/2003-12/6688.html

refers to this issue:

     *  Insert whitespace between tags Parser.c was updated to flush the 
text buffer before and after every (non-inline HTML) tag.

         The problem was that:

             foo<tag>bar</tag>baz

         would index as a single word "foobarbaz".


Where is the list of non-inline HTML tags defined? In the libxml2 HTML 
parser, or in swish-e somewhere?



>>-h[option]
> 
> 
> 
> Someone still uses -h?
> 
> 

<grin> you mean instead of --help or something?


> No, did you look at the code in check_html_tag()?
> 

I will.

> As for the rest of your question... you will have to wait.  My wife says 
> I have to make the coffee.
> 

hope it was good and greasy.

pek

-- 
Peter Karman - Software Publications Programmer - Cray Inc
phone: 651-605-9009 - mailto:karman@cray.com
Received on Tue Feb 3 09:12:16 2004