Skip to main content.
home | support | download

Back to List Archive

[swish-e] R: pdftotext

From: Michelangelo Rezzonico <mrezzonico(at)not-real.ticino.com>
Date: Tue Mar 10 2009 - 11:25:31 GMT
> Hi all,
> 
> I use pdftotext to index pdf-files.
> This works ok.
> The only problem is that in the output of pdftotext there are many spaces.
> 
> If in the pdf-file there is the string "2001", then in the output of
> pdftotext I find "2 0 0 1".
> 

I don't see this behavior with pdftotext 3.02.

The original may actually have space characters as a way to do faux
letter spacing.  What happens if you copy the text from the PDF file and
paste it into a text editor?

_______________________________________________________________________


If I copy the text from the PDF file and paste it into a text editor the
text is correct ("2001").

I have installed the version 3.02 but the problem is not solved.

I have written a small perl-script that use "CAM::PDF" but the problem is
the same as with pdftotext (the output is "2 0 0 1").

I have another converter (written in java) that works ok (the output is
"2001").

If possible I prefer to find a working solution with perl.

Any help is appreciated.  Michelangelo

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Mar 10 07:25:35 2009