Skip to main content.
home | support | download

Back to List Archive

Re: Kudos

From: Bill Moseley <moseley(at)>
Date: Mon Jan 08 2007 - 14:36:07 GMT
On Mon, Jan 08, 2007 at 01:11:14AM -0800, James wrote:
> I would be EXTREMELY happy if Swish-e was UTF-8 compatible before the end of
> the year.  Isn't that a reasonable goal?  I think the developers should
> shoot for September 1st (that gives them the summer to work through this
> too) as the date to release the UTF-8 compatible Swish-e.  I believe once
> Swish-e does this, you'll receive MUCH more attention (not that you don't
> receive a lot of attention now!).

I would be happy, too.  I'm just worried that the attention I would
get would be from the bank repossessing my house.   Swish processes
text -- so almost all the code deals with characters.  Plus, Swish has
been worked on by a number of developers for a decade or longer now,
so much of the code is showing its age.  So, much of the code would
need to be re-written.

Total Physical Source Lines of Code (SLOC)                = 62,677
Development Effort Estimate, Person-Years (Person-Months) = 15.42 (185.00)
 (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
Schedule Estimate, Years (Months)                         = 1.51 (18.17)
 (Basic COCOMO model, Months = 2.5 * (person-months**0.38))
Estimated Average Number of Developers (Effort/Schedule)  = 10.18
Total Estimated Cost to Develop                           = $ 2,082,597
 (average salary = $56,286/year, overhead = 2.40).
SLOCCount, Copyright (C) 2001-2004 David A. Wheeler

So we either need someone with lots of time or someone with lots of
money. ;)

A few days ago I was going over in my head an idea of how to patch the
current code to get some level of utf8 in it for those that need it
now.  I was basically wondering how much the current code could work
by just not knowing the encoding -- that is, where byte comparisons
would be fine or not (I think there's cases where two utf8 chars would
be the same but have different byte values).

Might have to lose wild card searches and the ability to do first
letter searches for words (there's currently a 256 wide table that
handles that).  And many of the config options might not work, and the
concept of "WordCharacters" would likely not work, and have to look at
new regex engine, and so on.

Basically, start pushing utf8 into swish and see where
things break.  That *might* be faster than a rewrite.  Or it could be
a waste of time as it would lead to a rewrite, just doing it the hard

Of course, I have not tried it yet.  I've been hoping someone with
time and utf8 experience might show up one of these days.  Is that

Bill Moseley

Unsubscribe from or help with the swish-e list:

Help with Swish-e:
Received on Mon Jan 8 06:36:08 2007