Hi Jose,
it is a little bit difficult ... There are 3 paths with html files.
path1: 3 html files
path2: 38,278 html files
path3: 64,168 html files
The whole directory is 5,565,272 kByte - unzipped. And some of the documents
are confidential.
Here is an extract of the old header file:
# Swish-e format: 2.2.3
#
# Name: ...
# Saved as: index-html.swish-e
# Counts: 501968 words, 105898 files
# Indexed on: 2004-04-16 05:09:12 CEST
# Description: ...
# Pointer: (no pointer)
# Maintained by: ...
# DocumentProperties: Enabled
# Stemming Applied: 0
# Soundex Applied: 0
# Fuzzy Indexing Mode: None
# IgnoreTotalWordCountWhenRanking: 1
# WordCharacters: ... (not changed)
# MinWordLimit: 2
# MaxWordLimit: 80
# BeginCharacters: ... (not changed)
# EndCharacters: ... (not changed)
# IgnoreFirstChar:
# IgnoreLastChar:
I think there are much too much files for Swish-E 2.4.2. We've tried 2.4.1
too, but the same result: segmentation fault.
cu Dietmar.
> Hi Dietmar,
>
> (I cannot contact you directly because of your email address)
> If possible, can you gzipped "path1" and "path2" and make them
> available to me to try them?
>
> cu
> Jose
>
> Dietmar Rabich escribió:
>
> >Some more information:
> >
> >In many other cases Swish-E crashes too. In each case there are many
> >documents to be indexed. Here an example:
> >
> >..
> >Removing very common words...
> >no words removed.
> >Writing main index...
> >Sorting words ...
> >Sorting 170,500 words alphabetically
> >Writing header ...
> >Writing index entries ...
> > Writing word text: 20%Segmentation fault
> >
> >cu Dietmar.
> >
> >
> >
> >>I've just a problem while indexing HTML-Files. I have update Swish-E
> from
> >>version 2.2.3 to 2.4.2. Indexing with the old version works fine. Now I
> >>get
> >>a message "segmentation fault".
> >>
> >>The config file is simple:
> >>
> >>IndexDir ../../path1 ../../path2
> >>IndexOnly .html
> >>IndexReport 3
> >>IndexFile ./test.swish-e
> >>IndexContents HTML .html
> >>DefaultContents HTML
> >>StoreDescription HTML <body> 2000
> >>...
--
"Sie haben neue Mails!" - Die GMX Toolbar informiert Sie beim Surfen!
Jetzt aktivieren unter http://www.gmx.net/info
Received on Tue Apr 20 00:11:15 2004