Skip to main content.
home | support | download

Back to List Archive

Indexing of word documents, stored on a UNIX box...

From: FISHER,JOSEPH (Non-HP-Roseville,ex1) <joseph_fisher(at)not-real.non.hp.com>
Date: Fri Aug 17 2001 - 18:34:19 GMT
Hi Bill,

My system is indexing incredibly fast...

One project down from 20 hours to 56 minutes, and another from 50 minutes to
3 minutes...

My search engine (JSWISHE) is working like a charm also...

But I have one more issue to discuss... (Until I find another one...)  ;^)))

I've got a number of word documents that are being stored on my Unix based
systems...

I've setup all of the environment variables as I did with the others...

When I index the documents, everything appears to go through just fine, with
the following exceptions:

	1) I get a warning message for each file being indexed:

		Warning: Possible embedded null in file
'/case_cr_rpts/docs/dataload/xml_spec3.doc'

	2) Only 1 word is being sorted and indexed...

		Removing very common words...
		no words removed.
		Writing main index...
		Sorting words ...
		Sorting 1 words alphabetically
		Writing header ...
		Writing index entries ...
 	 		Writing word text: Complete
 	 		Writing word hash: Complete
 	 		Writing word data: Complete
		1 unique word indexed.
		Writing file index...
		Writing file list ...
		Property Sorting complete.
		Writing sorted index ...
		33 files indexed.  2671616 total bytes.
		Elapsed time: 00:00:00 CPU time: 00:00:00
		Indexing done!

Anyone have any ideas???

Thanks in advance, and have a great weekend...

Joe Fisher
Received on Fri Aug 17 19:00:00 2001