Skip to main content.
home | support | download

Back to List Archive

Re: Different number of indexed words when indexing

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Wed Apr 26 2006 - 13:44:20 GMT
for our info, what version of libxml2 were you using?

Rodolfo Martinez scribbled on 4/26/06 8:38 AM:
> Hi,
> 
> I found who was causing this behavior, it was the libxml2 library.
> 
> I replaced TXT2 and HTML2 by TXT and HTML, respectively, in the configuration
> file. Now I'm getting the same number of indexed words _always_.
> 
> SWISH-E's internal parses requiere _much_ memory than libxml2 parser but it
> always work as expected.
> 
> Thanks again for you support,
> Rodolfo
> 
> --- Bill Moseley <moseley@hank.org> wrote:
> 
>> On Mon, Apr 24, 2006 at 08:56:51AM -0700, Rodolfo Martinez wrote:
>>> Hi Bill,
>>>
>>> Thanks for your response. I tried indexing just those files and got the
>> same
>>> keywords. I got this behavior only when indexing all information. I have
>>> hundreds (thousands?) of files in the same situation.
>>>
>>> I extracted the keywords and saw how they differ but I didn't get any clue.
>> Then maybe it's the count that is suspect?  I'm not sure what to tell
>> you.
>>
>>> I have other question, does the previous indexed file affect in some way
>> the
>>> current indexing process?
>> Nope.
>>
>> -- 
>> Bill Moseley
>> moseley@hank.org
>>
>> Unsubscribe from or help with the swish-e list: 
>>    http://swish-e.org/Discussion/
>>
>> Help with Swish-e:
>>    http://swish-e.org/current/docs
>>    swish-e@sunsite.berkeley.edu
>>
>>
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
Received on Wed Apr 26 06:44:20 2006