Skip to main content.
home | support | download

Back to List Archive

RE: swish-e 2.4.3 windows 2003 iis success!

From: Revillini, James <JRevillini(at)not-real.txcc.commnet.edu>
Date: Wed Jun 22 2005 - 16:42:32 GMT
> -----Original Message-----
> From: Bill Moseley [mailto:moseley@hank.org]
> Sent: Wednesday, June 22, 2005 12:31 PM
> To: Revillini, James
> Cc: Multiple recipients of list
> Subject: Re: swish-e 2.4.3 windows 2003 iis success!
> 
> On Wed, Jun 22, 2005 at 08:45:31AM -0700, Revillini, James wrote:
> > Would you mind giving some examples?  I've tried a multitude of
things
> > but I'm definitely not formulating the FileFilter directive
correctly
> > for my setup.
> >
> > I've located catdoc.exe, doc2txt.pm, and doc2html.pm.  When I use
the PM
> > files as the filter and run the indexer, it opens the pm files up in
> > word pad!
> 
> That's nice of Windows to do that for you.  Where would Wordpad open
> if you were indexing on a remote machine?

Indeed.  'tis a silly OS.  Oh, and the answer is that wordpad would
cross over into our world and go on a murdering rampage.

> 
> >
> > FileFilter .doc "perl.exe
> > e:/swish-e/lib/swish-e/perl/swish/filters/doc2html.pm"
> 
> What's doc2html.pm?  Do you mean Doc2html.pm?  That's not a
> FileFilter.

OK, Mr. Torvolds, have it your way.  Haha.  Thanks for the info.

> 
> Can you find your way through a little Perl?

I will try.

> 
> What I'd try is using the DirTree.pl program.  That should
> automatically filter for you.  It uses the SWISH::Filter module which
> deals with setting up filtering.

Cool.

> 
> You would likely need to edit DirTree.pl to only fetch the files you
> want indexed, but it's not very hard to do.  Then you can run it like
> this:
> 
>     perl /path/to/DirTree.pl /dir/to/index /other/dir > out.txt
> 
> That fetches and filters your documents and writes to out.txt.  Try it
> on a small directory first, of course.  The use your favorite editor
> to look at out.txt to make sure things are being filtered.
> 
> Then you import that data into swish like this:
> 
>     swish-e -S prog -c config -i stdin < out.txt
> 
> > OH - and another interesting tidbit: despite the fact that its
> > supposedly NOT indexing word documents, it apparently is indexing
some
> > of them.  Here's an example search result:
> 
> We didn't say it wouldn't index them, but swish (and libxml2) probably
> don't do a very good job at parsing the native .doc format.

Gotcha.

> 
> > Last question: what should I be seeing instead of (null), as what
does
> > that mean I have to do to get the output correct?  It does this for
> > documents of pdf, rtf and doc.
> 
> Means you don't have a description defined.

I was too quick in asking the question - I found that I had to add 

Metanames swishtitle swishdocpath

to my config.  Good thing for this searchable email archive.  
> 
> --
> Bill Moseley
> moseley@hank.org
> 
> Unsubscribe from or help with the swish-e list:
>    http://swish-e.org/Discussion/
> 
> Help with Swish-e:
>    http://swish-e.org/current/docs
>    swish-e@sunsite.berkeley.edu
Received on Wed Jun 22 09:42:39 2005