Roy Tennant wrote on 07/16/2009 07:44 PM:
> Does anyone have any experience yet with the Swish3 code to know if it
> will speed up indexing of very large data sets? Right now I'm using
> 2.4.7 to index 3.3 million tiny XML files and 2 million MARC records
> in XML (in separate jobs). Both take hours to finish on a server with
> 8 GB RAM. If I can get a significant performance boost with Swish3 I'd
> probably give a shot at beta testing it.
As I have noted before libswish3-based apps like swish_xapian are not
likely going to be any faster at indexing than Swish-e 2.x. In fact, I
have yet to find any FOSS IR project that indexes faster than Swish-e
does, but you trade speed for features. You don't have to re-index as
often (e.g.) if you have reliable incremental indexing.
I just did a small test using about 80k docs (about 1.5G) just for
Swish-e 2.4.7 = 00:07:46
swish_xapian = 00:32:47
Quite a difference. Note though that Swish-e 2.x by default (without -e)
does it all in RAM (except for properties) and only flushes to disk at
the end. Xapian flushes every N (default 10000) docs where N is
adjustable (set it with XAPIAN_FLUSH_THRESHOLD to something higher if
you have enough RAM to accomodate).
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
gpg key: 37D2 DAA6 3A13 D415 4295 3A69 448F E556 374A 34D9
Users mailing list
Received on Fri Jul 17 16:22:44 2009