Catdoc - following on from the Open2 problems in Win 2000

From: <Allan_Watts(at)>
Date: Fri Jan 23 2004 - 04:59:59 GMT

I'm still experimenting with ways to get around my Open2 problem  ( "
open2: IO::Pipe: Can't spawn-NOWAIT:..." error after reading 64 files).

To recap, waitid didn't solve the problem in for me. I sometimes
get the same NOWAIT error, after reading 65 files (improvement of 1), but
it is erratic - mostly the command prompt session freezes.

However, it did work in this test script (reading a 1000 files called
0000.doc to 0999.doc - collected from my C: drive and copied into a
directory c:\cat):

for (my $k = 0; $k < 1000; $k++)
      my $filename = "c:/cat/".substr("0000$k",-4).".doc";
      #my $command = "c:\\data\\swish\\catdoc\\catdoc.exe $filename";  #
version 0.93.3
      my $command = "c:\\progra~1\\swish-e\\lib\\swish-e\\catdoc.exe
$filename"; # Dave's version(?)
      my $pid = IPC::Open2::open2($rdrfh, $wtrfh, "$command" );
      waitpid $pid,0;
      binmode $rdrfh, ':crlf';
      $/ = undef;

      my $content =  <$rdrfh>;
      my $mtime  = (stat $filename)[9];
      my $size = length $content;

      print <<EOF;
Content-Length: $size
Last-Mtime: $mtime
Document-Type: TXT*
Path-Name: $filename

      print $content;

Except for a number of particular files.  I now seem to be getting tangled
up in catdoc/Win32 issues.

I tried two version of catdoc..

The first was the one which came with Swish-e 2.4. (Sounds like Dave did
some good work with this to get it to read long file names.)  Unfortunately
for a few of my Word documents it produced only a string of question marks
- when run from the command line. Or sometimes some text, and then a string
of question marks. When called while indexing, it seemed to cause swish-e
to hang.. (on one of these files).

I downloaded V.93.3 of catdoc from

This seemed to work better (except, it couldn't handle long filenames). And
it couldn't handle 10 of my files - giving a "Bad BBD entry!" error and
freezing (in 9 out of 10 cases).  The files it didn't work on were large
files (20MB) with lots of jpg included (the staff newsletter!).

I guess I just battle on.(I am getting around the long filenames by copying
the file somewhere else first, and I  have a list of files, now, that I
will ignore...)  Any suggestions appreciated..  (eg a way to trap errors
from catdoc).


