On Fri, Dec 12, 2003 at 07:43:18AM -0800, John Angel wrote:
> > Think of your suggestion. One document is 1250 and it includes a word
> > with the "d"-slash character. That word gets indexed -- since the index
> > stores numbers (not characters) that stored word includes the F0 byte.
> > The next document is in 8859-1 and it includes some word with the "eth"
> > character (it's an Icelandic document, I suppose) and that gets indexed,
> > and again there's a word that includes byte F0 in the index.
> >
> > Now you have a value in the index "F0" that represents more than one
> > character. So when searching are you looking for a 1250 char or 8859-1
> > char? You can't tell.
>
> It doesn't matter, as long as you find that character.
>
> Why it doesn't matter? Because I will put charset directly in HTML. Search
> script just has to find F0 always, it is not important what character is
> that.
Perhaps someone else can explain it better than I can.
--
Bill Moseley
moseley@hank.org
Received on Fri Dec 12 15:59:03 2003