Skip to main content.
home | support | download

Back to List Archive

Re: Indexing PDFs on Windows - Revisited....

From: Bill Moseley <moseley(at)>
Date: Sat Sep 25 2004 - 13:25:02 GMT
On Fri, Sep 24, 2004 at 11:32:26PM -0700, Anthony Baratta wrote:
> I rummaged through the code and discovered someone had kindly added
> $self->{pid} = $pid;
> in the windows_fork of But I didn't find any references to 
> "waitpid".

Ah, we have been through this before.  waitpid is called in swish.cgi
and I thought it got added to after this discussion:

Looks like the discussion didn't finish.

> In "$filter_sub = sub { ... " (Approx. line 1051 in, I added 
> "waitpid($doc->{pid},0);" just after "my $doc = $filter->convert( .." 
> and before "return 1 unless $doc;"

But that's won't really work because, as in the case of the PDF
filter, two programs are being run.

The correct solution is to make the call to windows_fork (the call to
IPC::Open2) return an object and then have a DESTROY function that
calls waitpid.

Another way might be to save all the PIDs.  So in the windows_form()

    push @{$self->{pid}}, $pid;

Then in convert()

        eval {
            local $SIG{__DIE__};
            $filtered_doc = $filter->filter($doc_object);

        # clean up Windows process table
        if ( ref $doc_object->{pid} ) {
            waitpid $_, 0 for @{ $doc_object->{pid} };
            delete $doc_object->{pid};

Can you test that one?  I'm not sure how long it takes to test --
maybe you could create a list of links to a bunch of small PDFs on
your local machine so it will run fast.

Or, if you can figure out how to use Win32::Process and avoid
IPC::Open2 completely.

I wonder what happens on Win98.  I thought I tried there once and $pid
was always the same number.

Bill Moseley

Unsubscribe from or help with the swish-e list:

Help with Swish-e:
Received on Sat Sep 25 06:25:20 2004