I've built 2.2RC1 with LIBXML 2.23.4 on Mac OS 10.1.5, and have been
experimenting with it the past two days.
I have a few questions about swish-e's XML support:
1) I plan to return search results as XML, however, in the Configuration
File Directives (http://swish-e.org/2.2/docs/SWISH-
CONFIG.html#Document_Contents_Directives) it appears that entities in XML
documents are evaluated regardless of the value of ConvertHTMLEntities:
"NOTE: Entities within XML files and files parsed with libxml2 are
converted regardless of this setting."
My current workaround for this is to build an XML result string, then pass
it through Tidy (http://tidy.sourceforge.net/) to re-escape entities.
I'd rather not do this if at all possible.
2) I'm indexing XML source documents in the file system. I can use the
configuration to use the first 100 characters of the document's root
element, 'page', as the description:
PropertyNamesMaxLength 100 swishdescription
PropertyNameAlias swishdescription page
However, when swish-e constructs the index, it's taking the attribute
values, as well as the text nodes of 'page'.
It's not clear how I could turn that off in the configuration file.
I'd also like to specify a location in the document to use as the
description, ie /page/section/para.
The workaround here would be to use the prog method to load pages and use
some xpath tool to extract that location and use as the page description.
Bill Humphries <email@example.com>
Webmaster, HR Systems
Received on Fri Sep 6 00:50:12 2002