Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] swish-e - Help with indexing pdf´s

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Tue Jul 06 2010 - 14:46:03 GMT
pgeo@gmx.de wrote on 07/06/2010 08:01 AM:
> 
> 
> 
> Hi @ All,
> 
> i´ve a little question about indexing PDF´s:
> 
> when I start the Index I got errors like this:
> 
> /Error: Illegal entry in bfchar block in ToUnicode CMap
> Error: Illegal entry in bfchar block in ToUnicode CMap
> Error: Illegal entry in bfchar block in ToUnicode CMap
> Error: Illegal entry in bfchar block in ToUnicode CMap
> // - Using HTML parser -  (2816 words)
> Error: Illegal entry in bfchar block in ToUnicode CMap 
> Error: Illegal entry in bfchar block in ToUnicode CMap
> /
> and so on ... and i don´t know why.


google tells me those error messages are from pdftotext, part of the
xpdf package. They aren't from swish-e.



> In the search-result i can see the pdf´s but without "Umlaute" (ä,ü,ö,...)
> 
> /... Der Tarif gilt nicht _für_ Mehrwertdienste ...

The error message above suggests the problem is with the encoding of the
PDF documents. Swish-e 2.x does not support UTF-8 encoding, but it does
support single-byte encodings. Try running pdftotext directly on one of
the PDF files and examine the output to see what swish-e is receiving.

 % pdftotext file.pdf > file.txt
 % hexdump -C file.txt




-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Jul 6 10:46:09 2010