Skip to main content.
home | support | download

Back to List Archive

Re: HTML Entities

From: <Rainer.Scherg(at)not-real.rexroth.de>
Date: Tue Nov 28 2000 - 10:34:40 GMT
Mhh,

 the behavior is IMO wrong (again historical reasons).

 The decoding of html entities should only be done in the
 html indexing routine (the same for wml and xml).

 WordCharacters has to trigger the decoded strings only...


cu - rainer




> -----Original Message-----
> From: jmruiz@boe.es [mailto:jmruiz@boe.es]
> Sent: Tuesday, November 28, 2000 10:17 AM
> To: Multiple recipients of list
> Subject: [SWISH-E] Re: HTML Entities
> 
> 
> Hi Bill,
> 
> I think this is also the way swish 1.3 works. 
> Probably, it is working this way for historical reasons.
> Should it be changed?
> 
> cu
> Jose
> 
> 
> 
> On 27 Nov 2000, at 12:26, Bill Moseley wrote:
> 
> > Ok, last post.  (maybe not)
> > 
> > Just to be clear, I'm talking about indexing words like this:
> > 
> >    electr&#243;nicos
> >    electrónicos
> > 
> > I think swish should index those the same (as electrónicos), but if
> > any of "&#;" are not included in WordCharacters then you end up like
> > this
> > 
> > -----> WORD INFO <-----
> > 243: 1 1 9 1 2
> > electr: 1 1 9 1 1
> > exxlectrónicos: 1 1 9 1 4  <-- This indexed ok.
> > nico: 1 1 9 1 3
> > 
> > 
> > 
> > Bill Moseley
> > mailto:moseley@hank.org
> > 
> 
> 
> 
> 
> ----------------------------------------------------------------------
> This Mail has been checked for Viruses
> Attention: Encrypted Mails can NOT be checked !
> 
> * * *
> 
> Diese Mail wurde auf Viren ueberprueft
> Hinweis: Verschluesselte Mails koennen NICHT geprueft werden !
> ----------------------------------------------------------------------
> 
Received on Tue Nov 28 10:37:00 2000