Bill Moseley wrote on 10/27/04 9:37 AM:
> On Tue, Oct 26, 2004 at 11:42:09PM -0700, Stein-Egil Museus wrote:
>
>><row><entry><para>559999</para></entry><entry><para>Some text</para></entry><row>
>
>
> Anyone know what the XML spec says about this? How do you know what
> are tags should split text?
sort answer: you don't.
http://www.w3.org/TR/2000/REC-xml-20001006#sec-white-space
n.b.:
<snip>
An XML processor must always pass all characters in a document that are
not markup through to the application. A validating XML processor must
also inform the application which of these characters constitute white
space appearing in element content.
A special attribute named xml:space may be attached to an element to
signal an intention that in that element, white space should be
preserved by applications. In valid documents, this attribute, like any
other, must be declared if it is used. When declared, it must be given
as an enumerated type whose values are one or both of "default" and
"preserve".
</snip>
>
> With HTML some tags are block level and some are inline:
>
> Libxml2 provides a way to tell the difference.
>
and I believe Bill put that fix into what will be the 2.4.3 release, to
increment position on HTML block elements.
but for XML, I think you have to either:
1. always bump position on a new tag, or
2. explore the 'xml:space' attribute a little more. Maybe that could be
used in swish to indicate whether word position should be bumped or not?
Like
XMLBumpPositionAttr 0|1
and if set to 1, bump position.
?
--
Peter Karman . http://www.cray.com/craydoc/ . karman(at)not-real.cray.com
Received on Wed Oct 27 07:55:54 2004