Skip to main content.
home | support | download

Back to List Archive

Re: Indexing PDF files - reliable ?

From: Cutts III, James H. <CuttsJ(at)not-real.missouri.edu>
Date: Thu Dec 08 2005 - 22:50:09 GMT
Not all PDF files are searchable.  It will depend on how the PDF was
created.  
There are 3 types of PDFs: PDF Normal, PDF Searchable Image and PDF
Image 
Only.  See http://www.dclab.com/pdf_conversion.asp for more information
about 
these different types.


James H. Cutts III
CORI - 143C Mumford


-----Original Message-----
From: swish-e@sunsite3.berkeley.edu
[mailto:swish-e@sunsite3.berkeley.edu] On Behalf Of David Larkin
Sent: Thursday, December 08, 2005 4:43 PM
To: Multiple recipients of list
Subject: [SWISH-E] Re: Indexing PDF files - reliable ?

On Thu, 8 Dec 2005 13:42:31 -0800 (PST)
Bill Moseley <moseley@hank.org> wrote:

> On Thu, Dec 08, 2005 at 01:05:59PM -0800, David Larkin wrote:
> > Is it due to PDF version number ?
> 
> Swish uses pdftotext.  Run that on the docs and see what comes out.
> 

79:k{david}% grep the Samba-Developers-Guide.txt | wc -l
     206
80:{david}% grep the spm.txt  | wc -l
     437
81:{david}% grep the isj2001-final.txt | wc -l
       0
82:{david}%

 isj2001-final.txt looks very strange , i wonder if original pdf came
from a scanner or some such thing


> --
> Bill Moseley
> moseley@hank.org
> 
> Unsubscribe from or help with the swish-e list: 
>    http://swish-e.org/Discussion/
> 
> Help with Swish-e:
>    http://swish-e.org/current/docs
>    swish-e@sunsite.berkeley.edu
> 
Received on Thu Dec 8 14:50:09 2005