On Tue, May 20, 2003 at 07:50:15AM -0700, Ivo Mans wrote:
> I'm trying to index OpenOffice files (on a furthermore perfect working swish-e installation).
> I've added following lines in my config:
> FileFilterMatch "/usr/bin/unzip" "-p \"%p\" content.xml" /\.(sxw|sxc|sxg)$/i
> IndexContents XML* .sxw .sxc .sxg
> StoreDescription XML <text> 20000
XML is one parser based on expat
XML2 is another parser based on libxml2
XML* says use the libxml2 parser if available, but fallback to expat otherwise.
So IndexContents XML* is really XML2 if you have libxml2 installed, but you are
using StoreDescription XML. Try StoreDescription XML* so it matches up.
It's confusing, yes.
> Resulting in error message:
> Warning: XML parse error in file './QU030423im01.sxw' line 2. Error: not well-formed
> (93 words)
> This goes for many or all of the OO-files on our network, created with recent OO-versions
> (mostly the latest v.126.96.36.199). Looking manually to the unzipped result looks like a fine
> XML-file to me, although too complex to be 100% sure.
> The unzipped content:
> line 1: <?xml version="1.0" encoding="UTF-8"?>
> line 2: All other data, including style definitions: can be extreme long line
Where's the opening tag?
<?xml version="1.0" encoding="UTF-8"?>
Received on Tue May 20 15:50:35 2003