Skip to main content.
home | support | download

Back to List Archive

Re: Swish-e scalability, performance

From: Aaron Bazar <aaronb(at)not-real.spamcop.net>
Date: Mon Nov 15 2004 - 23:52:23 GMT
 # SWISH format: 2.4.1
# Search words: (null)
#
# Index File: nov13
# Name:
# Saved as: nov13
# Total Words: 3259677
# Total Files: 2024134
# Indexed on: 2004-11-13 22:15:49 CST


I just built an index with over 2 million files. Queries are still
super-fast. 

Thanks!


Aaron Bazar
http://www.accqpoint.com/

-----Original Message-----
From: swish-e@sunsite3.berkeley.edu [mailto:swish-e@sunsite3.berkeley.edu]
On Behalf Of Bill Moseley
Sent: Monday, November 15, 2004 4:17 PM
To: Multiple recipients of list
Subject: [SWISH-E] Re: Swish-e scalability, performance

On Mon, Nov 15, 2004 at 01:04:57PM -0800, dstevens@roaddog.com wrote:
> A couple of the drawbacks with swish-e for a large Web wide search 
> tool were spider.pl after long (700-800k pages, or a few days) crawls 
> would hang or become incredibly slow even on a dual Opteron 242 with 4GB
ram.

Hum, do you think the machine was running low on memory?  spider.pl simply
keeps a hash of URLs seen, so it's all in memory.  It would be nice to have
spider.pl use either a database or BerkeleyDB so that it could be restarted
-- I thought about just using Storable to dump the hash to disk if it gets a
signal to abort.  Then read that back in to continue.

> To be fair I don't think the original intent of swish-e was to be a 
> Web wide level search tool, but it does a pretty good job up to a 
> million or two pages.

That's the bottom line. Kevin wrote the original swish in a weekend or so
and the basic design hasn't really changed.  Things are faster, but that's
about it.  That's kind of a problem, as you can evaluate swish and it looks
real fast compared to other indexers, but then you hit some limit and it
slows down real fast.

I'm always amazed when people post that they are using it for millions of
documents.

--
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Mon Nov 15 15:52:30 2004