Skip to main content.
home | support | download

Back to List Archive

Re: Specified IndexContents HTML but swish still uses HTML2

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Feb 03 2006 - 19:46:33 GMT
Back to the list.

On Fri, Feb 03, 2006 at 01:17:08PM -0600, Jon Sorensen wrote:
> > Are you sure you want to preserve entities?
> >
> 
> maybe I'm misunderstanding the documentation
> 
> I want entities such as &reg; to be preserved in the index
> so that when swish.cgi returns results it doesn't return the
> ascii character for &reg;, but the entity in the html.

Do you want people to be able to search for "reg" in your documents?

The entities are (in part) a way to include characters that are not
in the encoding your are delivering your web pages in.

Many of the common entities like &reg; are in Latin-1 so swish can
handle those and you don't need to use entities in your output if you
state that your encoding is 8859-1.

If you are using any entities that do not map to 8859-1 then swish
will replace those with a space.  (Swish only indexes 8 bit chars).

If you still want to use entities then you should convert the text in
your search script back into entities when generating results.  Just
like you would esacpe < > &.

The entities are only needed when sending the text to a web browser
and your encoding does not include those characters.

-- 
Bill Moseley
moseley@hank.org
Received on Fri Feb 3 11:46:35 2006