Skip to main content.
home | support | download

Back to List Archive

Re: swish-e-2.1-dev-25-2002-01-09

From: W. Addy Majewski <wam(at)>
Date: Fri Jan 11 2002 - 16:07:00 GMT
Bill Moseley wrote:

>At 07:23 AM 01/11/02 -0800, W. Addy Majewski wrote:
>>When indexing with "StoreDescription HTML <description> xxx", <script> 
>>tag content within the first xxx characters of the document is included 
>>and printed in the results page (documents with no meta description 
>>only). Scripts within the xxx chars are also serchable. A bug?
>Likely.  Are you using the HTML or HTML2 parser?
>I see in html.c this comment:
>//$$$$ Todo: remove tag and content of scripts, css, java, embeddedobjects,
>comments, etc
>I'm a bit surprised that you can't block it with <!-- --> comments.  And
>also that it's not being blocked by the IgnoreMetaTags directive.
>Let me look at the code.
>In the mean time, if you are not using HTML2 (libxml2) you might download
>that so you can build swish with it.  It will be a better parser than the
>built-in parser.  Plus, I'll probably only be able to fix the HTML2 parser.
The offending scripts did not use <!-- --> comments. We will try to add 
them as a workaround.

I believe it is the built-in parser, I didn't change anything in the source.

W. Addy Majewski
Ecopoint Inc.
P.O. Box 51074, Bramalea, ON  L6T 5M2, Canada
(905) 458-8562, fax (905) 458-0403

Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
Received on Fri Jan 11 16:07:40 2002