Skip to main content.
home | support | download

Back to List Archive

Re: pdf2xml problem while indexing pdf files

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Nov 12 2003 - 21:31:28 GMT
On Wed, Nov 12, 2003 at 12:12:12PM -0800, wayne.schomaker@state.co.us wrote:

> #!/usr/bin/perl -w
> use pdf2xml;
> my @files =
> system ('find /var/www/html/ccsp/docs/ -name *.pdf -print');
> # system ('find /var/www/html/ccsp/docs/ -name *.pdf >
> /var/www/html/ccsp/docs/results.file');

Does that work?  system() returns the exit status of the process.
In perl you use backticks to capture output, but for this I'd use the
standard File::Find module.  An example is in the DirTree.pl file
located in the prog-bin directory of the distribution.

Then the pdf2xml.pm module runs pdftotext and pdfinfo.  Are those in
your path?

> I and my tech support cannot figure out what "..file 65280.." is.  There is
> no such filename anywhere on the server and it is not a PDF file in our test
> directory (../ccsp/docs/). We are at a loss as to what to do next.

It's probably the return code from the system call.


-- 
Bill Moseley
moseley@hank.org
Received on Wed Nov 12 21:31:51 2003