Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] pdftotext

From: John Laurie <j.laurie(at)>
Date: Sun Jun 28 2009 - 22:41:11 GMT
Hi all


I'm having the same problem Thomas Dowling with the pdftotext creating
unwanted spaces in PDF documents. It's a crippling problem for a
database that's aiming for 100% accuracy.


The PDF native interface search works fine but the Swish-E based search
has a text that's full of words with gaps between the letters.  Eg. k o
o t i instead of kooti.


Our Swish E is the latest version with pdftotext 3.02.


I've only noticed it recently. It's only a big problem with some fonts
or perhaps newer versions of FineReader and Adobe. 


There is an example on our Early New Zealand books website at

Click on Search >> Go to Advanced Search

Click on Limit by Title and click on the + sign beside 1887 - Gudgeon,
T. W. The Defenders of New Zealand

Tick the box beside [Pages 300-335]


Search For: k o o t i     This phrase from the dropdown menu in Full


Click on 1[pages 300-335] to view the PDF. You can copy and paste text
from the PDF with no gaps between the letters. 


Try the same search for kooti or t h e or a n d


N.B. Te Kooti is a famous Maori leader and prophet who led a bitter
struggle against the colonial government in New Zealand in the 1860s -
an antipodean Geronimo. The name is a transliteration in Maori of the
missionary name Coates. 




John Laurie 
Digital Initiatives Librarian 
Digital Services
Level 3, General Library 
University of Auckland
Phone (09)3737599 x 85773 


Users mailing list
Received on Sun Jun 28 18:41:13 2009