Hi @ All,
i´ve a little question about indexing PDF´s:
when I start the Index I got errors like this:
Error: Illegal entry in bfchar block in ToUnicode CMap
Error: Illegal entry in bfchar block in ToUnicode CMap
Error: Illegal entry in bfchar block in ToUnicode CMap
Error: Illegal entry in bfchar block in ToUnicode CMap
 - Using HTML parser - (2816 words)
Error: Illegal entry in bfchar block in ToUnicode CMapÂ
Error: Illegal entry in bfchar block in ToUnicode CMap
and so on ... and i don´t know why.
Here´s my swish.conf:
IncludeConfigFile /opt/swish-e/conf/global.config
# Tell Swish-e what to index (same as -i switch above)
IndexDir /srv/www/html
IndexFile /srv/www/index/swish.index
# Only index HTML and text files
IndexOnly .htm .html .php .doc .xml .pdf
# FileFilter .pdf /export/home/swish-e/filter-bin/_pdf2html.pl
FileFilter .pdf /usr/bin/pdftotext "'%p' -"
# Tell Swish-e that .txt files are to use the text parser.
IndexContents TXT* .txt
# Otherwise, use the HTML parser
DefaultContents HTML*
StoreDescription HTML <search> 200000
# Ask libxml2 to report any parsing errors and warnings or Q
# any UTF-8 to 8859-1 conversion errors
ParserWarnLevel 9
PropertyNameAlias swishtitle title
MetaNames swishdocpath swishtitle
# Don't index any directories that contain the path segment "old"
(/usr/local/old/foo)
FileRules dirname contains /php/
FileRules dirname contains /Bilder/
In the search-result i can see the pdf´s but without "Umlaute"
(ä,ü,ö,...)
... Der Tarif gilt nicht für Mehrwertdienste ...
In my "global.config" I inserted:
WordCharacters
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789&#;|-'"()~!@$%_+äöüÄÖÜ
... but it still doesn´t work.
Perhaps, did somebody have an idea?
Regards
Peter
--
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Jul 6 09:01:20 2010