On Thu, Mar 29, 2007 at 03:51:24PM +0200, Clint wrote:
> Just to let you know, the following code:
> my $bytecount = length($$content);
Interesting, I would expect that not to work.
What version of Perl are you using?
Also, which version of the spider? There's a version number in the
I recently updated the spider to deal better (I hope) with character
encodings. So, I'm curious if there's a problem with the new code.
I'm also curious if it's maybe your server reporting the incorrect
The spider is suppose to look at the character encoding reported by
the web server (or in a meta tag in the web page) and decode that into
Perl's internal character encoding. The length() function, as you
have it above, should report the number of *characters* not bytes,
which would not be the same if there are multi-byte characters.
Is it possible you are indexing utf8 source but the web server is
reporting it as an eight-bit encoding? I'm not sure if decoding utf8
as latin1 would generate a warning.
Unsubscribe from or help with the swish-e list:
Help with Swish-e:
Users mailing list
Received on Thu Mar 29 11:42:51 2007