I'm cc'ing the list on this since my reply will likely be of some use to others.
Kevin Bowling wrote on 2/4/09 9:33 PM:
> I have a fairly large collection that I index using Swish-e 2.4.5. It works
> fairly well, but the indexing speed seems average (only one core) and the
> search results are just mediocre. Obviously that version is quite old and
> could use some updates. I am hoping Swish3 will be that update.
Swish3 won't be any faster at indexing, at least as it stands now. Xapian is a
good deal slower than Swish-e in my tests. And I guess it depends on what you
mean by "search results are mediocre" as to whether the Xapian-backed Swish3
will be any improvement. Speed should be comparable. Ranking should be somewhat
What Swish3 does that Swish-e 2.4.x does not is offer native UTF-8 and
incremental indexing support, scalable index size (Swish-e doesn't scale well
past about 1M docs), plus search bindings in many different languages (not just
C and Perl).
So there are tradeoffs. Swish-e 2.4.x is about as fast as you can get wrt
indexing and search speed. Swish3 trades some speed for full UTF-8 and lots more
flexibility and scalability.
> What I am confused about is that it now uses Xapian. I haven't tried Xapian
> but I know they have their own system called Omega. How does Swish3 differ
> from it? I just need a local filesystem indexer for a website with 200k+
> HTML, PDF, TXT and PS files. Are Swish-e and Omega the only two FOSS
Oh no. There are many. Lucene and its clones. KinoSearch. HyperEstraier (though
it seems to have fallen out of support). There are many others.
Swish3 offers a few things that Omega does not. MetaNames and PropertyNames for
one. Via SWISH::Prog, aggregation framework for http, mail, rdbms, as well as
filesystem. Single config file.
If all you need is a local filesystem indexer for a website with 200k+ docs
(which I would call medium-sized -- these days folks deal with multi-million doc
collections), and you don't need UTF-8 or incremental indexing, Swish-e 2.4.5 is
about as good as it gets. Don't let its age fool you. :)
FWIW, current SVN has some fixes/improvements over 2.4.5. There's actually a
2.4.6 tagged version that just hasn't ever made it to a fully-announced release
since we had some problems with the Windows build.
hope that helps.
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Users mailing list
Received on Wed Feb 4 23:49:43 2009