pgeo@gmx.de wrote on 07/06/2010 08:01 AM:
>
>
>
> Hi @ All,
>
> i´ve a little question about indexing PDF´s:
>
> when I start the Index I got errors like this:
>
> /Error: Illegal entry in bfchar block in ToUnicode CMap
> Error: Illegal entry in bfchar block in ToUnicode CMap
> Error: Illegal entry in bfchar block in ToUnicode CMap
> Error: Illegal entry in bfchar block in ToUnicode CMap
> // - Using HTML parser - (2816 words)
> Error: Illegal entry in bfchar block in ToUnicode CMap
> Error: Illegal entry in bfchar block in ToUnicode CMap
> /
> and so on ... and i don´t know why.
google tells me those error messages are from pdftotext, part of the
xpdf package. They aren't from swish-e.
> In the search-result i can see the pdf´s but without "Umlaute" (ä,ü,ö,...)
>
> /... Der Tarif gilt nicht _für_ Mehrwertdienste ...
The error message above suggests the problem is with the encoding of the
PDF documents. Swish-e 2.x does not support UTF-8 encoding, but it does
support single-byte encodings. Try running pdftotext directly on one of
the PDF files and examine the output to see what swish-e is receiving.
% pdftotext file.pdf > file.txt
% hexdump -C file.txt
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Jul 6 10:46:09 2010