Skip to main content.
home | support | download

Back to List Archive

Re: Indexing xml files that has another included xml

From: Peter Karman <karman(at)not-real.cray.com>
Date: Fri Sep 10 2004 - 14:41:58 GMT
Bernhard Weisshuhn wrote on 9/10/04 9:08 AM:
> On Fri, Sep 10, 2004 at 06:54:30AM -0700, Peter Karman <karman@cray.com> wrote:
> 
> 
>>Bill Moseley wrote on 9/9/04 2:04 AM:
>>
>>
>>>Which, of course, we use the SAX interface.  I also see on
>>>
>>>  http://www.xmlsoft.org/html/index.html
>>>
>>>that our SAX usage of libxml2 is deprecated.  Looks like a trip to the
>>>xml list might be in my future.
>>
>>If you do consider rewriting swish-e to use the DOM interface, consider 
>>making it optional/configurable. I suspect that folks use swish-e with 
>>XML that might be derived from a database (which SAX seems better for), 
>>as well as 'real' XML documents (which DOM seems better for -- as in 
>>this case with resolving entities).
> 
> 
> I seriously doubt whether using the DOM interface would solve more problems
> than it would create. Some xml files get *hughe*, and might be indexed
> for exactly that reason. Indexing hughe files via DOM will drive
> indexing speed down and resource requirements up. Maintaining both
> interfaces within swish-e drives the load on our cherished developers
> up, something we also don't want, do we?
> 
> I personally find filtering stuff through xmllint acceptable, swish-e
> users are used to filter all kinds of documents prior to indexing.

point(s) well taken (esp. the cherished developers). filtering through 
xmllint seems like a better solution and is consistent with the filter 
model.


-- 
Peter Karman  651-605-9009  karman@cray.com
Received on Fri Sep 10 07:42:27 2004