Skip to main content.
home | support | download

Back to List Archive

Re: Swish-e max db size vs. Google App

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Apr 22 2005 - 19:10:32 GMT
On Fri, Apr 22, 2005 at 10:39:07AM -0700, Thomas Dowling wrote:
> Greetings--

Howdy!

> I'm working with some staff members here who are interested in what we
> could do with a Google Appliance.  My gut reaction is, "Not much that we
> couldn't do with a beefy Linux box and Swish-e", natch.  But I find that
> I don't really have a sense of Swish-e's upper limits in terms of the
> number of documents or size/number of indexes it can work with.

You have a Google Appliance for free?

> The home page says, "Swish-e is ideally suited for collections of a
> million documents or smaller."  I've seen posts on the list about 2GB+
> indexes of ~6 million documents under 2.5.x, along with a comment from
> Bill that that was pushing the envelope.  Does that reflect reasonable
> upper limits for current and forthcoming versions repectively?  Am I
> overlooking something obvious in the documentation?

I can't really answer.  Swish is not designed to scale to huge
collections -- for some value of huge.  Clearly if you have a lot more
RAM, disk, cpu, and time to wait you can index more.  Swish uses hash
tables that tend to slow down as they get larger.  Try using -S prog
and a program to generate random docs and have it report changes in
indexing every few thousand files and you can watch it slow down.
Using -e helps a lot (in the tests I did) but it still slows down
after a while.

Searching also depends entirely on memory.

Hopefully others that use swish for large collections will reply.

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Fri Apr 22 12:10:34 2005