Skip to main content.
home | support | download

Back to List Archive

RE: swish-e 2.4.3 windows 2003 iis success!

From: Revillini, James <JRevillini(at)not-real.txcc.commnet.edu>
Date: Wed Jun 22 2005 - 18:17:44 GMT
> -----Original Message-----
> From: Bill Moseley [mailto:moseley@hank.org]
> Sent: Wednesday, June 22, 2005 12:31 PM
> To: Revillini, James
> Cc: Multiple recipients of list
> Subject: Re: swish-e 2.4.3 windows 2003 iis success!
> 
> On Wed, Jun 22, 2005 at 08:45:31AM -0700, Revillini, James wrote:
> > Would you mind giving some examples?  I've tried a multitude of
things
> > but I'm definitely not formulating the FileFilter directive
correctly
> > for my setup.
> >
> > I've located catdoc.exe, doc2txt.pm, and doc2html.pm.  When I use
the PM
> > files as the filter and run the indexer, it opens the pm files up in
> > word pad!
> 
> That's nice of Windows to do that for you.  Where would Wordpad open
> if you were indexing on a remote machine?
> 
> >
> > FileFilter .doc "perl.exe
> > e:/swish-e/lib/swish-e/perl/swish/filters/doc2html.pm"
> 
> What's doc2html.pm?  Do you mean Doc2html.pm?  That's not a
> FileFilter.
> 
> Can you find your way through a little Perl?
> 
> What I'd try is using the DirTree.pl program.  That should
> automatically filter for you.  It uses the SWISH::Filter module which
> deals with setting up filtering.
> 
> You would likely need to edit DirTree.pl to only fetch the files you
> want indexed, but it's not very hard to do.  Then you can run it like
> this:
> 
>     perl /path/to/DirTree.pl /dir/to/index /other/dir > out.txt

This worked with a small directory pretty well, but bombed when I tried
to index the big muthah.  

I'm getting a ton of these:

1048 Warning - //fileservername/folder/path/to/files/some-document.doc:
Use of uninitialized value in waitpid at
e:\swish-e\lib\swish-e\perl/SWISH/Filter.pm line 1375.

I'm getting a bunch of these:

Failed to set content type for document
'//fileservername/folder/path/to/files/some-document.doc'

And right before it bombs I get about 1 page of these: 

Can't  opendir(//fileservername/folder/path/to/a/folder): Invalid
argument
 at dirtree.pl line 88

I've tried to find other people with the same thing happening, but all
happened before the current release, so I don't know what's been fixed.
No one seems to be having the waitpid issue on line 1375, so that must
be a rewrite.  Also, as you may have ascertained, Perl isn't my forte,
but I can find my way around it when I need to.


> 
> That fetches and filters your documents and writes to out.txt.  Try it
> on a small directory first, of course.  The use your favorite editor
> to look at out.txt to make sure things are being filtered.
> 
> Then you import that data into swish like this:
> 
>     swish-e -S prog -c config -i stdin < out.txt
> 
> > OH - and another interesting tidbit: despite the fact that its
> > supposedly NOT indexing word documents, it apparently is indexing
some
> > of them.  Here's an example search result:
> 
> We didn't say it wouldn't index them, but swish (and libxml2) probably
> don't do a very good job at parsing the native .doc format.
> 
> > Last question: what should I be seeing instead of (null), as what
does
> > that mean I have to do to get the output correct?  It does this for
> > documents of pdf, rtf and doc.
> 
> Means you don't have a description defined.
> 
> --
> Bill Moseley
> moseley@hank.org
> 
> Unsubscribe from or help with the swish-e list:
>    http://swish-e.org/Discussion/
> 
> Help with Swish-e:
>    http://swish-e.org/current/docs
>    swish-e@sunsite.berkeley.edu
Received on Wed Jun 22 11:17:46 2005