Re: [swish-e] Version 2.4.5 Error

From: Bill Moseley <moseley(at)>
Date: Wed Sep 12 2007 - 20:44:11 GMT
On Wed, Sep 12, 2007 at 03:40:04PM -0500, Peter Karman wrote:
> So likely there's an issue with and how it is calculating length()
> for docs with unreliable encodings. That's my guess anyway. could
> probably be made smarter about sanity checking the docs for length and
> encoding, and made to fail gracefully somehow. I know there's been talk here
> lately about some of the encoding stuff it does.

The spider just needs to *always* decode on input, then encode back to
the original charset, and then use length() to report the length.
That seems like the most simple and correct way to go.  Seems right to
you, Peter?

Bill Moseley

Received on Wed Sep 12 16:44:11 2007