At 02:55 PM 08/17/01 -0700, FISHER,JOSEPH (Non-HP-Roseville,ex1) wrote:
>Hi Bill,
>
>Ok, I understand that I need to include a filter file in order to index the
>contents of MS Word documents stored on a Unix system... (As I understand
>it, this was NOT necessary under SWISH 1.3...)
That's always been the case. Swish-e has never natively parsed word docs.
Rainer added the filter feature to allow indexing other document types.
>I've downloaded and compiled "catdoc"... Catdoc is even referenced in one of
>the filter files under SWISH-E 2.1...
>
> .../filter-bin/_doc2text.sh
Again, I would not advise using a shell script for performance reasons.
>I've installed it in it's default location, and made sure that the filter
>file is pointing to the correct directory structure...
>
>But which configuration file should I modify to make SWISH-E sees this MS
>Word filter file?
What config files do you have?
The example in the reference SWIHS-CONFIG I posted shows:
FileFilter .doc /usr/local/bin/catdoc "-s8859-1 -d8859-1 '%p'"
That would go in your swish configuration file.
So you might have swish.conf
IndexOnly .html .htm .doc .txt
IndexContents HTML .html .htm
IndexContents TXT .doc .txt
FileFilter .doc /usr/local/bin/catdoc "-s8859-1 -d8859-1 '%p'"
then run
./swish-e -c swish.conf -i /home/docs
If the documentation is unclear please say so, and what you think needs to
be changed or is confusing.
Bill Moseley
mailto:moseley@hank.org
Received on Fri Aug 17 22:37:58 2001