Skip to main content.
home | support | download

Back to List Archive

Re: Different number of indexed words when indexing large mount of data

From: Rodolfo Martinez <macr111080(at)not-real.yahoo.com.mx>
Date: Mon Apr 24 2006 - 16:04:07 GMT
Hi Bill,

Thanks for your response. I tried indexing just those files and got the same
keywords. I got this behavior only when indexing all information. I have
hundreds (thousands?) of files in the same situation.

I extracted the keywords and saw how they differ but I didn't get any clue.

I have other question, does the previous indexed file affect in some way the
current indexing process?

Regards,
Rodolfo.

P.S. I could send the indexed files and the swish's output if you want.

--- Bill Moseley <moseley@hank.org> wrote:

> On Thu, Apr 20, 2006 at 04:21:35PM -0700, Rodolfo Martinez wrote:
> > In dir "../disk2/Info/ebsp/apac/cn":
> >   benefits.htm - Using HTML2 parser -  (43 words)
> 
> 
> > In dir "../disk2/Info/ebsp/apac/cn":
> >   benefits.htm - Using HTML2 parser -  (40 words)
> 
> Try indexing just those files and use -T indexed_words to see how
> they differ.  Might give a clue.
> 
> -- 
> Bill Moseley
> moseley@hank.org
> 
> Unsubscribe from or help with the swish-e list: 
>    http://swish-e.org/Discussion/
> 
> Help with Swish-e:
>    http://swish-e.org/current/docs
>    swish-e@sunsite.berkeley.edu
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
Received on Mon Apr 24 09:04:21 2006