Skip to main content.
home | support | download

Back to List Archive

Re: Problem indexing OpenOffice files

From: Ivo Mans <swish(at)not-real.ivo.mans-manik.com>
Date: Wed May 21 2003 - 08:19:46 GMT
Douglas Smith wrote:

>Yes, I would like to know more about this.  I got this to work, but not
>in a nice way.  I used the same filter line, with the "unzip content.xml"
>and there was lots of xml to parse.  But XML2 would return nothing for
>some reason, and no content would get indexed.
>
After some mailing up-and-down to Bill Moseley our server indexed last 
night pretty succesfull, based on following configuration:

FileFilterMatch "/usr/bin/unzip" "-p \"%p\" content.xml" /\.(sxw|sxc|sxg)$/i
IndexContents XML* .xml .sxw .sxc .sxg
StoreDescription XML* <text:p> 20000

Seems like all our OO-files on the network are now stored with 
description. Of course this only involves the 'plain'-text in the 
documents. OO-metatags like 'Author' are not included, since OO stores 
it in another xml file. I might start writing a kind of  "oo2xml"  
script one of these day in order to deliver a more informative xml to 
swish-e.

However the original mentioned error message remained:

	Warning: XML parse error in file './QU030423im01.sxw' line 2.  Error: not well-formed
Allthough it doesn't seem to bother the process, it was annoying to me.

Turns out my swish-e was not compiled with XML2.
Did an upgrade of my libxml2 and recompiled swish-e. Now the error message is disappeared.

Kind regards,
Ivo Mans
Received on Wed May 21 08:19:54 2003