Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] pdftotext

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Wed Aug 19 2009 - 14:56:20 GMT
John Laurie wrote on 06/28/2009 05:41 PM:
> Hi all
> 
>  
> 
> I’m having the same problem Thomas Dowling with the pdftotext creating 
> unwanted spaces in PDF documents. It’s a crippling problem for a 
> database that’s aiming for 100% accuracy.
> 
>  
> 
> The PDF native interface search works fine but the Swish-E based search 
> has a text that’s full of words with gaps between the letters.  Eg. k o 
> o t i instead of kooti.
> 
>  
> 
> Our Swish E is the latest version with pdftotext 3.02.
> 
>  
> 
> I’ve only noticed it recently. It’s only a big problem with some fonts 
> or perhaps newer versions of FineReader and Adobe.
> 

sorry this has languished in my inbox for awhile.

This is likely a problem with pdftotext, not swish-e. Try running the 
pdf through pdftotext directly and examine the text output. If the 
spaces are there, your problem lies with the xpdf software.

Looks like xpdf devel has slowed except for security patches. You might 
try poppler instead:
http://poppler.freedesktop.org/



-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
gpg key: 37D2 DAA6 3A13 D415 4295  3A69 448F E556 374A 34D9
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed Aug 19 10:56:22 2009