Skip to main content.
home | support | download

Back to List Archive

Re: Pdf problem

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Tue Dec 14 2004 - 15:08:05 GMT
On Tue, Dec 14, 2004 at 02:51:41AM -0800, Andrea Pasquini wrote:
> Hi, 
> I use swish-e_2.4.2 and I've  problem with the pdf files.
> After launch of  $ ./swish-e -Sprog -c swish.conf  this error  is in the
> output and the crawler go on :
> ...
> Error: Couldn't find cidToUnicode file for the 'Adobe-WinCharSetFFFF' collection
> Error: Unknown character collection 'Adobe-WinCharSetFFFF'
> Error: Unknown font tag 'R137'
> Error: May not be a PDF file (continuing anyway)
> Error (0): PDF file is damaged - attempting to reconstruct xref table...
> Error: Couldn't find trailer dictionary
> Error: Couldn't read xref table
> http://www.di.unipi.it/sindacati/21set2004.pdf - Using HTML2 parser -  (no
> words indexed)

That's output from pdftotext.  This is all I get:

$ pdftotext 21set2004.pdf out.txt
Error: Unknown character collection 'Adobe-WinCharSetFFFF'
Error: Unknown font tag 'R137'

It seems to have generated the output without any other problems,
though.

You might try updating your version of xpdf.



-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Tue Dec 14 07:08:10 2004