RE: swish-e 2.4.3 windows 2003 iis success!

From: Revillini, James <JRevillini(at)>
Date: Wed Jun 22 2005 - 18:17:44 GMT
> From: Bill Moseley []
> Sent: Wednesday, June 22, 2005 12:31 PM
> To: Revillini, James
> Cc: Multiple recipients of list
> Subject: Re: swish-e 2.4.3 windows 2003 iis success!
> On Wed, Jun 22, 2005 at 08:45:31AM -0700, Revillini, James wrote:
> > Would you mind giving some examples?  I've tried a multitude of
> > but I'm definitely not formulating the FileFilter directive
> > for my setup.
> > I've located catdoc.exe,, and  When I use
the PM
> > files as the filter and run the indexer, it opens the pm files up in
> > word pad!
> That's nice of Windows to do that for you.  Where would Wordpad open
> if you were indexing on a remote machine?
> > FileFilter .doc "perl.exe
> > e:/swish-e/lib/swish-e/perl/swish/filters/"
> What's  Do you mean  That's not a
> FileFilter.
> Can you find your way through a little Perl?
> What I'd try is using the program.  That should
> automatically filter for you.  It uses the SWISH::Filter module which
> deals with setting up filtering.
> You would likely need to edit to only fetch the files you
> want indexed, but it's not very hard to do.  Then you can run it like
> this:
>     perl /path/to/ /dir/to/index /other/dir > out.txt

This worked with a small directory pretty well, but bombed when I tried
to index the big muthah.  

I'm getting a ton of these:

1048 Warning - //fileservername/folder/path/to/files/some-document.doc:
Use of uninitialized value in waitpid at
e:\swish-e\lib\swish-e\perl/SWISH/ line 1375.

I'm getting a bunch of these:

Failed to set content type for document

And right before it bombs I get about 1 page of these: 

Can't  opendir(//fileservername/folder/path/to/a/folder): Invalid
 at line 88

I've tried to find other people with the same thing happening, but all
happened before the current release, so I don't know what's been fixed.
No one seems to be having the waitpid issue on line 1375, so that must
be a rewrite.  Also, as you may have ascertained, Perl isn't my forte,
but I can find my way around it when I need to.

> That fetches and filters your documents and writes to out.txt.  Try it
> on a small directory first, of course.  The use your favorite editor
> to look at out.txt to make sure things are being filtered.
> Then you import that data into swish like this:
>     swish-e -S prog -c config -i stdin < out.txt
> > OH - and another interesting tidbit: despite the fact that its
> > supposedly NOT indexing word documents, it apparently is indexing
> > of them.  Here's an example search result:
> We didn't say it wouldn't index them, but swish (and libxml2) probably
> don't do a very good job at parsing the native .doc format.
> > Last question: what should I be seeing instead of (null), as what
> > that mean I have to do to get the output correct?  It does this for
> > documents of pdf, rtf and doc.
> Means you don't have a description defined.
> Bill Moseley
