> -----Original Message-----
> From: Bill Moseley [mailto:moseley@hank.org]
> Sent: Wednesday, June 22, 2005 12:31 PM
> To: Revillini, James
> Cc: Multiple recipients of list
> Subject: Re: swish-e 2.4.3 windows 2003 iis success!
>
> On Wed, Jun 22, 2005 at 08:45:31AM -0700, Revillini, James wrote:
> > Would you mind giving some examples? I've tried a multitude of
things
> > but I'm definitely not formulating the FileFilter directive
correctly
> > for my setup.
> >
> > I've located catdoc.exe, doc2txt.pm, and doc2html.pm. When I use
the PM
> > files as the filter and run the indexer, it opens the pm files up in
> > word pad!
>
> That's nice of Windows to do that for you. Where would Wordpad open
> if you were indexing on a remote machine?
Indeed. 'tis a silly OS. Oh, and the answer is that wordpad would
cross over into our world and go on a murdering rampage.
>
> >
> > FileFilter .doc "perl.exe
> > e:/swish-e/lib/swish-e/perl/swish/filters/doc2html.pm"
>
> What's doc2html.pm? Do you mean Doc2html.pm? That's not a
> FileFilter.
OK, Mr. Torvolds, have it your way. Haha. Thanks for the info.
>
> Can you find your way through a little Perl?
I will try.
>
> What I'd try is using the DirTree.pl program. That should
> automatically filter for you. It uses the SWISH::Filter module which
> deals with setting up filtering.
Cool.
>
> You would likely need to edit DirTree.pl to only fetch the files you
> want indexed, but it's not very hard to do. Then you can run it like
> this:
>
> perl /path/to/DirTree.pl /dir/to/index /other/dir > out.txt
>
> That fetches and filters your documents and writes to out.txt. Try it
> on a small directory first, of course. The use your favorite editor
> to look at out.txt to make sure things are being filtered.
>
> Then you import that data into swish like this:
>
> swish-e -S prog -c config -i stdin < out.txt
>
> > OH - and another interesting tidbit: despite the fact that its
> > supposedly NOT indexing word documents, it apparently is indexing
some
> > of them. Here's an example search result:
>
> We didn't say it wouldn't index them, but swish (and libxml2) probably
> don't do a very good job at parsing the native .doc format.
Gotcha.
>
> > Last question: what should I be seeing instead of (null), as what
does
> > that mean I have to do to get the output correct? It does this for
> > documents of pdf, rtf and doc.
>
> Means you don't have a description defined.
I was too quick in asking the question - I found that I had to add
Metanames swishtitle swishdocpath
to my config. Good thing for this searchable email archive.
>
> --
> Bill Moseley
> moseley@hank.org
>
> Unsubscribe from or help with the swish-e list:
> http://swish-e.org/Discussion/
>
> Help with Swish-e:
> http://swish-e.org/current/docs
> swish-e@sunsite.berkeley.edu
Received on Wed Jun 22 09:42:39 2005