Re: [swish-e] Swish 3

From: Peter Karman <peter(at)>
Date: Fri Jul 17 2009 - 20:22:43 GMT
Roy Tennant wrote on 07/16/2009 07:44 PM:
> Does anyone have any experience yet with the Swish3 code to know if it
> will speed up indexing of very large data sets? Right now I'm using
> 2.4.7 to index 3.3 million tiny XML files and 2 million MARC records
> in XML (in separate jobs). Both take hours to finish on a server with
> 8 GB RAM. If I can get a significant performance boost with Swish3 I'd
> probably give a shot at beta testing it.

As I have noted before[0] libswish3-based apps like swish_xapian are not 
likely going to be any faster at indexing than Swish-e 2.x. In fact, I 
have yet to find any FOSS IR project that indexes faster than Swish-e 
does, but you trade speed for features. You don't have to re-index as 
often (e.g.) if you have reliable incremental indexing.

I just did a small test using about 80k docs (about 1.5G) just for 

Swish-e 2.4.7 = 00:07:46
swish_xapian  = 00:32:47

Quite a difference. Note though that Swish-e 2.x by default (without -e) 
does it all in RAM (except for properties) and only flushes to disk at 
the end. Xapian flushes every N (default 10000) docs where N is 
adjustable (set it with XAPIAN_FLUSH_THRESHOLD to something higher if 
you have enough RAM to accomodate).

Peter Karman  .  .  peter(at)
gpg key: 37D2 DAA6 3A13 D415 4295  3A69 448F E556 374A 34D9
