On Fri, Apr 22, 2005 at 10:39:07AM -0700, Thomas Dowling wrote:
> I'm working with some staff members here who are interested in what we
> could do with a Google Appliance. My gut reaction is, "Not much that we
> couldn't do with a beefy Linux box and Swish-e", natch. But I find that
> I don't really have a sense of Swish-e's upper limits in terms of the
> number of documents or size/number of indexes it can work with.
You have a Google Appliance for free?
> The home page says, "Swish-e is ideally suited for collections of a
> million documents or smaller." I've seen posts on the list about 2GB+
> indexes of ~6 million documents under 2.5.x, along with a comment from
> Bill that that was pushing the envelope. Does that reflect reasonable
> upper limits for current and forthcoming versions repectively? Am I
> overlooking something obvious in the documentation?
I can't really answer. Swish is not designed to scale to huge
collections -- for some value of huge. Clearly if you have a lot more
RAM, disk, cpu, and time to wait you can index more. Swish uses hash
tables that tend to slow down as they get larger. Try using -S prog
and a program to generate random docs and have it report changes in
indexing every few thousand files and you can watch it slow down.
Using -e helps a lot (in the tests I did) but it still slows down
after a while.
Searching also depends entirely on memory.
Hopefully others that use swish for large collections will reply.
Unsubscribe from or help with the swish-e list:
Help with Swish-e:
Received on Fri Apr 22 12:10:34 2005