Skip to main content.
home | support | download

Back to List Archive

Re: Filter.pm & Windows Thread safety

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Sun May 23 2004 - 05:12:51 GMT
Hi James,

On Sat, May 22, 2004 at 08:41:16PM -0700, Job, James wrote:
> However, about 25-50 documents into my crawl, I'd start seeing "Skipped
> whatever.doc due to filter 'filter_content' user supplied function #1.

So the filter started to fail.


> 
> Looking at task manager, I would see a running "catdoc" or "pdftotext"
> process.  After tearing my hair out for a while, I suspected there may be a
> threading issue (since I'm running a SMP system),

I don't know anything about SMP or threaded applications.  Can you
explain why just having two CPUs would result in such a problem?

> and made some changes to
> the windows_fork subroutine in Filter.pm.  I eventually had success with the
> following:

Good.  I'll apply the patch, but I'd like to understand what's
happening.

>     my $pid = IPC::Open2::open2($rdrfh, $wtrfh, @command );
> 
>     # --- BEGIN WIN32 SMP MODS
>     # Wait for Process to complete before we continue (max 10 sec), else kill it!
>     use POSIX ":sys_wait_h";
>     my ($stiff, $tcks);
>     $tcks = 0;
>     while (($stiff=waitpid(-1,&WNOHANG))>0 && $tcks<9) {
>     	sleep 1;
>     	$tcks++;
>     	}
>     if ($tcks>8) {
>     	$pid->Kill(9);
>     	}
>     # --- END WIN32 SMP MODS

OK, so is that waiting on the just run program?  Seems like would want
to do that after reading from the pipe.  I would think the OS would
block the program until the pipe was read from -- so it would always get
killed.

Or is it too late and I'm missing something obvious?

Thanks,


BTW -- what ever happened with your other problem:

  Warning: Failed to uncompress Property. zlib uncompress returned: -5.
  uncompressed size: 140 buf_len: -1073746392

Did that go away after reindexing?


-- 
Bill Moseley
moseley@hank.org
Received on Sat May 22 22:12:53 2004