On Tue, May 20, 2003 at 01:23:16PM -0700, Douglas Smith wrote:
> Yes, I would like to know more about this. I got this to work, but not
> in a nice way. I used the same filter line, with the "unzip content.xml"
> and there was lots of xml to parse. But XML2 would return nothing for
> some reason, and no content would get indexed. I switched to HTML2,
> which indexed a bunch of junk along with the content, but got all the
> connect that users wanted, so I left it as a cludge until I could
> fix the XML2.
Perhaps two different problems:
I had Ivo Mans send me the OO file, and I uncompressed it and then indexed with:
swish-e -i content.xml -T indexed_words parsed_tags
That showed me that the text was in <text:p> tag, not <text>.
I then used a config file of:
StoreDescription XML* <text:p> 20000
and it then stored the description.
I did not spend too much time looking at the xml, so there might need to be other tags to
setup as an alias (perhaps <text:s>). -T parsed_tags isn't as helpful as I expected -- I
thought it used to indent and show ending tags. Oh well.
The other problem is that Ivo was seeing an error from the parser -- that might be due to
all the xml on a single line. I did not have that problem, but I'm using a newer version of
$ xml2-config --version
Received on Tue May 20 20:37:07 2003