Skip to main content.
home | support | download

Back to List Archive

pdf2html.pm and File::Temp.pm

From: Gerald Klaas <gklaas(at)not-real.arb.ca.gov>
Date: Wed Jan 30 2002 - 18:34:45 GMT
I am using the -S prog method to index an intranet server.
My spider.pl has a function to to use pdf2html.pm for
converting pdf files.

I am currently seeing a lot of these errors
---snip---
-Skipped http://inside.arb.ca.gov/ds/regact/01zev21.PDF due to 'filter_content'
user supplied function #1 death '../src/spider.pl: Failed close on pipe to pdfin
fo for /tmp/Gpcivvv24w: 256 at /app/swish/prog-bin/pdf2html.pm line 138.
'
---end snip---

line 138 in pdf2html.pm is
close $sym or die "$0: Failed close on pipe to pdfinfo for $file: $?";

Can someone help me with troubleshooting tips to figure out why this
is having trouble closing the file?

On a possibly related issue, the pdf2html.pm is using File::Temp.pm
I notice that my /tmp directory fills up with random name files during
the spider.pl run, and then they're all deleted after the run completes.
What can I change to remove the temp file as soon it's been fed to
swish, rather than wait to the end of the spider run?

Gerald Klaas
Received on Wed Jan 30 18:36:29 2002