Skip to main content.
home | support | download

Back to List Archive

Re: Strange indexing of word documents

From: Thomas Nyman <thomas(at)not-real.teg.pp.se>
Date: Tue Apr 12 2005 - 17:06:04 GMT
2005-04-12 kl. 18.43 skrev Bill Moseley:

> On Tue, Apr 12, 2005 at 09:34:19AM -0700, Thomas Nyman wrote:
>> I received help getting catdoc to work and all seems well..or so i
>> thought. I'm not getting any error messages but I have noted the
>> certain word documents are not being indexed. I have one test document
>> which simply contains the word Dog and Computer but when I search for
>> "Dog" I recieve no hits event though the document contains that word.
>> If i search on document path and give the file name swish-e finds the
>> file. It looks like its being indexed but if thats the case why am I
>> not receiving any hits?
>>
>> FileFilter .doc /usr/bin/catdoc "-s8859-1 -d8859-1 '%p'"
>
> What happens when you run that on your document?
>
> What happens when you index just that one document and use
> "-T indexed_words"?
>
> -- 
> Bill Moseley
> moseley@hank.org
>
> Unsubscribe from or help with the swish-e list:
>    http://swish-e.org/Discussion/
>
> Help with Swish-e:
>    http://swish-e.org/current/docs
>    swish-e@sunsite.berkeley.edu
>
>
I figured out how to index the file with your command

the output is as follows using the command " swish-e -i 
/usr/local/arkiv/pil.doc -c swish.conf -T indexed_words"

Indexing Data Source: "File-System"
Indexing "/usr/local/arkiv/pil.doc"
     Adding:[1:swishdocpath(11)]   'pil'   Pos:1  Stuct:0x1 ( FILE )
     Adding:[1:swishdocpath(11)]   'doc'   Pos:2  Stuct:0x1 ( FILE )
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 2 words alphabetically
Writing header ...
Writing index entries ...
   Writing word text: Complete
   Writing word hash: Complete
   Writing word data: Complete
2 unique words indexed.
5 properties sorted.
1 file indexed.  10 total bytes.  2 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
Received on Tue Apr 12 10:06:04 2005