Skip to main content.
home | support | download

Back to List Archive

[swish-e] StoreDescription XML

From: Peter Flynn <pflynn(at)not-real.ucc.ie>
Date: Wed Dec 01 2010 - 16:18:22 GMT
There is a much more serious problem in indexing XML documents.

Unlike HTML, where the element type most people are interested in is
<body>, the element types of an XML document are not fixed, and can be
anything. In the case of Word .docx files the text is in <w:body> and
for OpenOffice .odt files it is <document:body>, but in other XML
documents it could be <article>, <book>, <report>, or virtually anything.

Is there a way to specify the StoreDescription directive to use the root
element type, whatever it happens to be, if the named element type is
not in a particular document?

If not, can this be put on the list? If we could be sure that namespaces
like w: and document: were being ignored, a syntax such as this would be
good:

StoreDescription XML <body>,<article>,<book>,<>

where it would use one of the named element types if it existed, and
otherwise the <> would mean "the whole document".

///Peter
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed Dec 1 11:18:28 2010