after a long test finally I will use swish to index a over 2 million record
database. In each record I have "news" encoded with "our" encoding (i.e. 8
bit ascii standard with special character coded by special escape sequence.
I have java libs that convert this strings in normalized form (for
indexing) and html code (to show on web page).
I build a java Index filter that query the database and build an XML2 such:
<?xml version="1.0" encoding="ASCII" standalone="yes" ?>
<field>Information normalized to be indexed built with java function to
normalize "our" string</filed>
<property>Property information to put in swish property build with java
function to tranform "our" string in html, i.e. special character are like
For many records swish says:
CFI0515272.xml:1: warning: Failed to convert internal UTF-8 to Latin-1.
Replacing non ISO-8859-1 char with char ' '
697;d</subproperty><subproperty type="700.b">, Vsevolod Ėmilʹevič
it does not like the č I think that the problem is that my html is
encoded with utf8 and swish latin-1 does not recognize it. For my goals I
need that swish put in property the string as it is (with č) ... How
can I manage it?
Does my choose on encoding are correct? I'm not so good with encoding
special character problem :-)
Biblioteca Nazionale Centrale di Firenze
Piazza Cavalleggeri 1
Tel.: +39 055 24919 220
Received on Fri May 10 11:16:28 2002