Skip to main content.
home | support | download

Back to List Archive

Comparaison for indexation beetwen stable ans dev versions of SWISH-E

From: <m.chartoire(at)not-real.ipnl.in2p3.fr>
Date: Wed Aug 21 2002 - 14:10:45 GMT
 I am comparing the result of indexation of a web site with the SWISH-E 
 stable version  2.0.5  and  swish-e-2.1-dev-25-2002-08-19.
 
 I use the -S http switch in both case, libxml2 for v2.1.
  
 Here are the end of the logs for  2.0.5
 
Removing very common words...
430 words removed.
0 words removed not in common words array:

Writing main index...
Computing hash table ...
Writing header ...
Writing index entries ...
Writing stopwords ...
17020 unique words indexed.
Writing file index...
Writing file list ...
Writing file offsets ...
Writing MetaNames ...
Writing offsets (2)...
1120 files indexed.
Running time: 574 minutes, 44 seconds.
Indexing done!

 and for  2.1-dev-25

Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 73095 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: Complete
  Writing word hash: Complete
  Writing word data: Complete
73095 unique words indexed.
4 properties sorted.                                              
818 files indexed.  5226259 total bytes.  403805 total words.
Elapsed time: 00:14:49 CPU time: 00:00:06
Indexing done!


 In this test the time is'n significant (delay 30 for 2.0.5, delay 1 for 
2.1-dev-25)

But for the number of indexed files, it seem that 2.1-dev-25 do not index 
equivalent server. As you can see in config files I have the same directives:

EquivalentServer http://lyoinfo.in2p3.fr http://snovae.in2p3.fr/ipnl
EquivalentServer http://lyoinfo.in2p3.fr http://doc.in2p3.fr/delphi/ipnl

 In the log of 2.1-dev-25 I can see :

Skipping http://doc.in2p3.fr/delphi/ipnl/:  Wrong method or server.
Skipping http://snovae.in2p3.fr/ipnl/:  Wrong method or server.

 In 2.0.5 the files of this two server are indexed.

 It seem also that the removing of common words do not work in 2.1-dev-25.
 
 I don't understand the great difference of unique words indexed beetwen the 
 two versions.
 
 Any idea ?

-- 
Martial Chartoire, Service Informatique | E-mail: m.chartoire@ipnl.in2p3.fr
Institut de Physique Nucleaire de Lyon  | phone : +33 472 448 430
43, BD du 11 Novembre 1918              | fax   : +33 472 448 004
F 69622 Villeurbanne Cedex              |
Received on Wed Aug 21 14:14:21 2002