On Sat, Dec 06, 2003 at 09:23:01PM -0800, Dave Stevens wrote:
> > Pushing the design of swish-e, perhaps. Seems like more people are
> > using swish-e for large collections. How much RAM are you using?
>
> This machine is a single Athlon XP 1800+ with a an inexpensive Asus K7
> board and only 512 MB of RAM.
I assume you are using -e when indexing.
> > Did you look at inktomi? It uses a database that is searchable as it is
> > indexed.
>
> No, but I will. I've looked at Nutch but it doesn't seem like much is
> going on, though they do post snaps fairly regularly and I've heard they
> have anonymous access to CVS but haven't used it. Last year I looked at
> Google appliances for my former sites and it was a couple hundred grand
> for a two year license. At this point there is no investment in the
> current project other than me (don't even have a business model yet) so I
> need to make it work with a free sort of software license. I'm willing to
> spend a few grand in hardware and move back into a colo and support that
> (about half a dozen boxes live in what was my dining room) but I won't be
> able to afford licensing any enterprise level software.
On such a large scale you need something where you can incrementally
update the index. Frankly, if documents are available locally I think
completely reindexing with swish-e is often as fast as updating other
types of indexes. Maybe.
Another to look at, if you can stand java, is Lucene. I haven't tried
it but their goal is an Open Source large-scale search engine. Hey, Bob
Dylan's site uses it (although I could not get it to work).
http://jakarta.apache.org/lucene/docs/index.html
--
Bill Moseley
moseley@hank.org
Received on Sun Dec 7 06:34:47 2003