On Thu, Mar 24, 2005 at 01:08:21AM -0800, Roman Chyla wrote:
> Hi,
>
> libxml2 converts the stuff into utf8 and then sends it to swish-e in
> iso8859-1. I was looking at libxml2 site, and found it is possible to
> compile it with iconv support (or more, tell libxml2 to output the
> document in the original encoding).
That's when writing the tree back to an xml file. Swish is using the
SAX parser so always gets the data in utf8.
> it is possible to change the way libxml2 outputs to swish-e?
No, but you could change to a different 8-bit encoding for indexing.
> this would help me to use HTML2 parser even for non-iso8859-1 documents.
> However, what should I look at? How can I do it (if I can)?
In parser.c look at function Convert_to_latin1(). You would need to
replace the call to (libxml's) UTF8Toisolat1() with another function
-- perhaps an iconv function (and adjust the following code to work
with that function).
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Thu Mar 24 06:37:10 2005