Skip to main content.
home | support | download

Back to List Archive

Re: Problem indexing OpenOffice files

From: Ivo Mans <swish(at)not-real.ivo.mans-manik.com>
Date: Tue May 20 2003 - 17:48:02 GMT
Bill Moseley wrote:

>On Tue, May 20, 2003 at 07:50:15AM -0700, Ivo Mans wrote:
>  
>
>>I'm trying to index OpenOffice files (on a furthermore perfect working swish-e installation).
>>I've added following lines in my config:
>>
>>FileFilterMatch "/usr/bin/unzip" "-p \"%p\" content.xml" /\.(sxw|sxc|sxg)$/i
>>IndexContents XML* .sxw .sxc .sxg
>>StoreDescription XML <text> 20000
>>    
>>
>
>Try StoreDescription XML* so it matches up.
>
Just tried. No change.

>>Resulting in error message:
>>Warning: XML parse error in file './QU030423im01.sxw' line 2.  Error: not well-formed
>> (93 words)
>>
>>This goes for many or all of the OO-files on our network, created with recent OO-versions
>>(mostly the latest v.1.0.3.1). Looking manually to the unzipped result looks like a fine
>>XML-file to me, although too complex to be 100% sure.
>>
>>The unzipped content:
>>line 1: <?xml version="1.0" encoding="UTF-8"?>
>>line 2: All other data, including style definitions: can be extreme long line
>>    
>>
>
>Where's the opening tag?
>  
>
Here the opening tag (as said: original is all on 1 line):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE office:document-content PUBLIC "-//OpenOffice.org//DTD 
OfficeDocument 1.0//EN" "office.dtd">
<office:document-content 
xmlns:office="http://openoffice.org/2000/office" 
xmlns:style="http://openoffice.org/2000/style" 
xmlns:text="http://openoffice.org/2000/text" 
xmlns:table="http://openoffice.org/2000/table" 
xmlns:draw="http://openoffice.org/2000/drawing" 
xmlns:fo="http://www.w3.org/1999/XSL/Format" 
xmlns:xlink="http://www.w3.org/1999/xlink" 
xmlns:number="http://openoffice.org/2000/datastyle" 
xmlns:svg="http://www.w3.org/2000/svg" 
xmlns:chart="http://openoffice.org/2000/chart" 
xmlns:dr3d="http://openoffice.org/2000/dr3d" 
xmlns:math="http://www.w3.org/1998/Math/MathML" 
xmlns:form="http://openoffice.org/2000/form" 
xmlns:script="http://openoffice.org/2000/script" office:class="text" 
office:version="1.0">
    ...
</office:document-content>




*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Tue May 20 17:48:02 2003