Skip to main content.
home | support | download

Back to List Archive

Re: swish-e 2.4.3 windows 2003 iis success!

From: Revillini, James <JRevillini(at)not-real.txcc.commnet.edu>
Date: Wed Jun 22 2005 - 15:53:12 GMT
> > 6. m$ word docs aren't indexing properly.  Unfortunately, I just
noticed
> this and have not researched it at all.  I just ran the index again on
a
> subdirectory and noticed that all word docs are showing that only 1
word
> gets indexed.  Here's the config file:
> > In dir "z:/subdirectory/subsub":
> >   Word doc 1.doc - Using DEFAULT (HTML2) parser -  (1 words)
> >   Word doc 2.doc - Using DEFAULT (HTML2) parser -  (1 words)
> 
> Just need to setup a FileFilter directive that uses catdoc, wvware, a
> SWISH::Filter script or some other word converter.
> 

Would you mind giving some examples?  I've tried a multitude of things
but I'm definitely not formulating the FileFilter directive correctly
for my setup.

I've located catdoc.exe, doc2txt.pm, and doc2html.pm.  When I use the PM
files as the filter and run the indexer, it opens the pm files up in
word pad!  I then tried passing them as parameters to perl; i.e.

FileFilter .doc "perl.exe
e:/swish-e/lib/swish-e/perl/swish/filters/doc2html.pm"

This didn't raise an error but it followed each word doc with "(no words
indexed)".  

I also tried cutting and pasting from the documentation to use the
catdoc method, but even though I changed the path, it says it can't find
the executable.

OH - and another interesting tidbit: despite the fact that its
supposedly NOT indexing word documents, it apparently is indexing some
of them.  Here's an example search result:

1 October PSO minutes.doc -- rank: 1000
    (null)
    Last Modified Date:	1998-11-18 14:20:48 Eastern Standard Time
    Document Size:	712512
    Document Path:	file://fileservername/subfolder//path/to/October
PSO minutes.doc

Last question: what should I be seeing instead of (null), as what does
that mean I have to do to get the output correct?  It does this for
documents of pdf, rtf and doc.

Thanks,
Jim
Received on Wed Jun 22 08:53:15 2005