Skip to main content.
home | support | download

Back to List Archive

Re: Segmentation fault while indexing with"StoreDescription"

From: José Manuel Ruiz <jmruiz(at)not-real.boe.es>
Date: Tue Apr 20 2004 - 08:07:48 GMT
Hi Dietmar,

I see. Anyway, I think that the number of files is not a problem. I 
reindex every night about 500.000 xml files using the XML2 parser (I am 
using the current CVS version).
Have you tried HTML2 instead of HTML?
Wich is your platform (linux, solaris...)?

cu
Jose


Dietmar Rabich escribió:

>Hi Jose,
>
>it is a little bit difficult ... There are 3 paths with html files.
>
>path1: 3 html files
>path2: 38,278 html files
>path3: 64,168 html files
>
>The whole directory is 5,565,272 kByte - unzipped. And some of the documents
>are confidential.
>
>Here is an extract of the old header file:
>
># Swish-e format: 2.2.3
># 
># Name: ...
># Saved as: index-html.swish-e
># Counts: 501968 words, 105898 files
># Indexed on: 2004-04-16 05:09:12 CEST
># Description: ...
># Pointer: (no pointer)
># Maintained by: ...
># DocumentProperties: Enabled
># Stemming Applied: 0
># Soundex Applied: 0
># Fuzzy Indexing Mode: None
># IgnoreTotalWordCountWhenRanking: 1
># WordCharacters: ... (not changed)
># MinWordLimit: 2
># MaxWordLimit: 80
># BeginCharacters: ... (not changed)
># EndCharacters: ... (not changed)
># IgnoreFirstChar: 
># IgnoreLastChar: 
>
>I think there are much too much files for Swish-E 2.4.2. We've tried 2.4.1
>too, but the same result: segmentation fault.
>
>cu Dietmar.
>
>  
>
>>Hi Dietmar,
>>
>>(I cannot contact you directly because of your email address)
>>If possible, can you gzipped "path1" and  "path2" and make them 
>>available to me to try them?
>>
>>cu
>>Jose
>>
>>Dietmar Rabich escribió:
>>
>>    
>>
>>>Some more information:
>>>
>>>In many other cases Swish-E crashes too. In each case there are many
>>>documents to be indexed. Here an example:
>>>
>>>..
>>>Removing very common words...
>>>no words removed.
>>>Writing main index...
>>>Sorting words ...
>>>Sorting 170,500 words alphabetically
>>>Writing header ...
>>>Writing index entries ...
>>> Writing word text:  20%Segmentation fault
>>>
>>>cu Dietmar.
>>>
>>> 
>>>
>>>      
>>>
>>>>I've just a problem while indexing HTML-Files. I have update Swish-E
>>>>        
>>>>
>>from
>>    
>>
>>>>version 2.2.3 to 2.4.2. Indexing with the old version works fine. Now I
>>>>get
>>>>a message "segmentation fault".
>>>>
>>>>The config file is simple:
>>>>
>>>>IndexDir ../../path1 ../../path2
>>>>IndexOnly .html
>>>>IndexReport 3
>>>>IndexFile ./test.swish-e
>>>>IndexContents HTML .html
>>>>DefaultContents HTML
>>>>StoreDescription HTML <body> 2000
>>>>...
>>>>        
>>>>
>
>  
>
Received on Tue Apr 20 01:07:49 2004