Skip to main content.
home | support | download

Back to List Archive

Re: pdftotext - erroring out

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Oct 24 2002 - 14:53:11 GMT
On Thu, 24 Oct 2002, intervolved none wrote:

> 
> Thanks Bill for the response. It is all PDF's that it runs against.  
> I have downloaded PDF's from the web, tried to index them and all of
> them fail.  I have run the program pdftotext.exe at the command line
> and it converts the files fine (I have not brought it up in a hex
> editor to look for unprintables...) .  What I mean by fine is that I
> see that text that was in the PDF file and there are no noticible
> problems.

Well then you are not trying the right file. ;)

That error message is from either pdftotext or pdftoinfo.  I've had
similar problems and it was a matter of finding a way to show me which
file was the problem, as I explained.

Start isolating your problem.  Narrow it down to one file.  Then divide
the indexing steps up and I'm sure you will find the problem.  Edit the
pdf conversion script to warn() the file name before calling
pdftoinfo and pdftotext.  Make sure the source pdf is exacatly like the
pdf file that pdftotext is seeing.  Try all the normal debugging steps.

Are you on Windows or OS X?  Maybe you are seeing some line ending
conversions.

-- 
Bill Moseley moseley@hank.org
Received on Thu Oct 24 14:56:57 2002