There is a much more serious problem in indexing XML documents.
Unlike HTML, where the element type most people are interested in is
<body>, the element types of an XML document are not fixed, and can be
anything. In the case of Word .docx files the text is in <w:body> and
for OpenOffice .odt files it is <document:body>, but in other XML
documents it could be <article>, <book>, <report>, or virtually anything.
Is there a way to specify the StoreDescription directive to use the root
element type, whatever it happens to be, if the named element type is
not in a particular document?
If not, can this be put on the list? If we could be sure that namespaces
like w: and document: were being ignored, a syntax such as this would be
StoreDescription XML <body>,<article>,<book>,<>
where it would use one of the named element types if it existed, and
otherwise the <> would mean "the whole document".
Users mailing list
Received on Wed Dec 1 11:18:28 2010