Thomas Dowling scribbled on 4/22/05 12:42 PM:
> I'm working with some staff members here who are interested in what we
> could do with a Google Appliance. My gut reaction is, "Not much that we
> couldn't do with a beefy Linux box and Swish-e", natch. But I find that
> I don't really have a sense of Swish-e's upper limits in terms of the
> number of documents or size/number of indexes it can work with.
> The home page says, "Swish-e is ideally suited for collections of a
> million documents or smaller." I've seen posts on the list about 2GB+
> indexes of ~6 million documents under 2.5.x, along with a comment from
> Bill that that was pushing the envelope. Does that reflect reasonable
> upper limits for current and forthcoming versions repectively? Am I
> overlooking something obvious in the documentation?
google appliance vs swish-e isn't really a fair comparison. even if you could
index several million docs with swish-e and not see a performance hit (which you
will, as Bill noted), the two tools have different strengths/weaknesses.
Google is really good at indexing LOTS of docs, quickly, and searching even
quicker. With the appliance you're getting hardware that's tuned just for
google. And you're paying through the nose for it. Though that includes support
from google. You get what you pay for.
Swish-e is really good at indexing what I think of as medium-size collections.
Sure, it can probably do several million docs, but not nearly at the rate that
But the real defining issue for me is the ranking/accuracy of the search.
Swish-e gives you exactly what you asked for, and because you can custmize the
metanames/props in infinite custom variations, you can get really specific
queries. Google, on the other hand, has their vaunted PageRank system, highly
secretive and proprietary, and pretty good. Not perfect, but pretty good. The
secret lies in the relative importance of any given doc as rated by the rest of
the docs (the algorithm is the company secret). Swish-e ranking doesn't even
pretend to do that. It's very simplistic, though fairly useful, depending on
what you're trying to find.
Bottom line: if you have the money, and you want to index/search several million
docs of varying format, size and complexity, and you want something that you can
just plug in, turn on and point at your servers, go google. You get the support
and the pedigree. If you don't have the money or want finer control over what
you index and what kind of info you keep in the index, consider Swish-e --
though be prepared to get creative in terms of multiple indexes, etc., to
minimize the performance hit.
you get what you pay for.
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Received on Fri Apr 22 13:54:07 2005