Bill,
Maybe you could steal one of these guys:
http://mnogosearch.com/doc/msearch-authors.html
I am sure one of them has the needed experience. :-) :-P
Kind Regards,
James
On 1/8/07, Bill Moseley wrote:
>
> On Mon, Jan 08, 2007 at 01:11:14AM -0800, James wrote:
> > I would be EXTREMELY happy if Swish-e was UTF-8 compatible before the
> end of
> > the year. Isn't that a reasonable goal? I think the developers should
> > shoot for September 1st (that gives them the summer to work through this
> > too) as the date to release the UTF-8 compatible Swish-e. I believe
> once
> > Swish-e does this, you'll receive MUCH more attention (not that you
> don't
> > receive a lot of attention now!).
>
> I would be happy, too. I'm just worried that the attention I would
> get would be from the bank repossessing my house. Swish processes
> text -- so almost all the code deals with characters. Plus, Swish has
> been worked on by a number of developers for a decade or longer now,
> so much of the code is showing its age. So, much of the code would
> need to be re-written.
>
> Total Physical Source Lines of Code (SLOC) = 62,677
> Development Effort Estimate, Person-Years (Person-Months) = 15.42 (185.00)
> (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
> Schedule Estimate, Years (Months) = 1.51 (18.17)
> (Basic COCOMO model, Months = 2.5 * (person-months**0.38))
> Estimated Average Number of Developers (Effort/Schedule) = 10.18
> Total Estimated Cost to Develop = $ 2,082,597
> (average salary = $56,286/year, overhead = 2.40).
> SLOCCount, Copyright (C) 2001-2004 David A. Wheeler
>
> So we either need someone with lots of time or someone with lots of
> money. ;)
>
> A few days ago I was going over in my head an idea of how to patch the
> current code to get some level of utf8 in it for those that need it
> now. I was basically wondering how much the current code could work
> by just not knowing the encoding -- that is, where byte comparisons
> would be fine or not (I think there's cases where two utf8 chars would
> be the same but have different byte values).
>
> Might have to lose wild card searches and the ability to do first
> letter searches for words (there's currently a 256 wide table that
> handles that). And many of the config options might not work, and the
> concept of "WordCharacters" would likely not work, and have to look at
> new regex engine, and so on.
>
> Basically, start pushing utf8 into swish and see where
> things break. That *might* be faster than a rewrite. Or it could be
> a waste of time as it would lead to a rewrite, just doing it the hard
> way.
>
> Of course, I have not tried it yet. I've been hoping someone with
> time and utf8 experience might show up one of these days. Is that
> you?
>
> --
> Bill Moseley
*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Mon Jan 8 06:45:43 2007