> Hi all,
>
> I use pdftotext to index pdf-files.
> This works ok.
> The only problem is that in the output of pdftotext there are many spaces.
>
> If in the pdf-file there is the string "2001", then in the output of
> pdftotext I find "2 0 0 1".
>
I don't see this behavior with pdftotext 3.02.
The original may actually have space characters as a way to do faux
letter spacing. What happens if you copy the text from the PDF file and
paste it into a text editor?
_______________________________________________________________________
If I copy the text from the PDF file and paste it into a text editor the
text is correct ("2001").
I have installed the version 3.02 but the problem is not solved.
I have written a small perl-script that use "CAM::PDF" but the problem is
the same as with pdftotext (the output is "2 0 0 1").
I have another converter (written in java) that works ok (the output is
"2001").
If possible I prefer to find a working solution with perl.
Any help is appreciated. Michelangelo
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Mar 10 07:25:35 2009