Skip to main content.
home | support | download

Back to List Archive

Re: Indexing of word documents, stored on a UNIX

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Aug 17 2001 - 19:03:45 GMT
At 11:31 AM 08/17/01 -0700, FISHER,JOSEPH (Non-HP-Roseville,ex1) wrote:
>When I index the documents, everything appears to go through just fine, with
>the following exceptions:
>
>	1) I get a warning message for each file being indexed:
>
>		Warning: Possible embedded null in file
>'/case_cr_rpts/docs/dataload/xml_spec3.doc'

Well, without seeing your config, I don't know.  To index Word documents you need to use a filter (or add filtering to your program if indexing with -S prog).

http://sunsite.berkeley.edu/SWISH-E/2.2/docs/SWISH-CONFIG.html#Document_Filter_Directives

Don't use a shell or perl script to call catdoc -- rather call catdoc directly as shown in the example.   The scripts will kill your indexing speed.




Bill Moseley
mailto:moseley@hank.org
Received on Fri Aug 17 19:13:47 2001