Skip to main content.
home | support | download

Back to List Archive

Re: Request for comments on new project

From: Dave Seff <dseff(at)not-real.advisen.com>
Date: Thu Jan 06 2005 - 15:22:05 GMT
I have a few reasons why I am doing this:

1. A personal learning exercise. 

2. I wanted to write it in C. I wanted to make it as fast as possible.
Not to say that mod_perl/apache is not, but I did not want the added
overhead. My company warehouses hundreds of millions of documents for
the insurance industry and they need to be able to search quickly
throught them. 

3. If any of you are familliar with Verity K2 sevrer, my company is
looking for an open-source replacement for it. Meaning we have
applications written for it and would like to rewrite as little of our
main apps as possible. I need something that is completely transparent
to the applications. 

4. Load balancing isn't exactly the goal I am looking for here. While
that is fine, I needed a way to take multiple responces from the swish
results and collate them into a coherent order whether by date or
relevance etc . . . For example if you are searching 100 indecies across
50 mcahines, I would normally sort them by the order on which your load
balancer received them. That may be fine, But I wanted the results from
the server to be transparent to the client rather then have the client
figure out what order to tally them. This is where the cluster_mgr comes
in. It doesn't do it yet. But that is a project goal. 


On Wed, 2005-01-05 at 19:37, Michael Peters wrote:
> Bill Moseley wrote:
> > On Wed, Jan 05, 2005 at 06:46:31PM -0500, Dave Seff wrote:
> > 
> >>I have looked at SWISHED. The one thin I wanted to avoid is apache.
> >>Swishd uses persistant connections and forks with each new connection.
> >>It doesn't not do any cacheing although I don't see any reason why it
> >>shouldn't have it.
> > 
> > 
> > Avoid Apache to save memory?  Or why?  I wonder how your code would
> > compare against SWISHED.
> 
> I have the same question. Apache is *very* stable, well supported and 
> well known. If you used it as your base then all things like handling 
> multiple clients, logging, etc are handled for you. Escpecially with the 
> flexibility of Apache2 I see almost no reason to implement a network 
> daemon. You'd just be playing catch-up.
> 
> Also in regards to your cluster manager, load balancing is a well known 
> problem with existing solutions. If you were using apache as the 
> backend, then it should be trivial to put an open source load balancer 
> (LVS, etc) in front of it.
> 
> I'm not trying to knock the idea. I like the prospect of having a 
> cluster handling the searching, indexing, etc. It's just that I've been 
> developing mod_perl for a while and following mod_perl2/apache2 with a 
> lot of interest. It's really exciting what you can do with 
> apache2/mod_perl2 that is more than just dynamic web sites (protocol 
> handlers, etc).
-- 
Dave Seff <dseff@advisen.com>
Advisen Ltd.
Received on Thu Jan 6 07:22:18 2005