Skip to main content.
home | support | download

Back to List Archive

HTML Comments are being indexed by Swish

From: Madhumita Banerjee <mbanerjee(at)not-real.princesscruises.com>
Date: Tue Apr 03 2001 - 01:13:02 GMT
Hi,

I have downloaded and installed Swish-E on AIX (Version 4.3).
The search engine was installed properly and the index file is also being
generated. I am using the FileSystem access method.
The browser I am using is IE 5.5 .
The problem I am facing is as follows:

I have a few HTML pages containing code similar to the following (i.e. HTML tags
enclosed within comment tags):

<!--[if gte mso 9]><xml>
 <o:DocumentProperties>
  <o:Author>qltllwr</o:Author>
  <o:Keywords>captain, capt, capt.,pax svs director, passenger services
director, psd, psds, psd's,pax svs. director, maytre d, maitre d', maitres d',
maitre' d,maitre d's,onboard, maitre d, fleet personnel, fleet, dining captain,
officer, officers, on board,onboard people, on board prople, onboard mgmt, on
board mgmt,onboard management, on board management, on board, onboard stadd, on
board staff, cruise director, cruise directors, cd, cd's, cds,commodore,
comm.,ship staff, ship's staff, ship personnel, personel, personnel, purser,
pursers, perser, persers,manning</o:Keywords>
 </o:DocumentProperties>
</xml><![endif]-->

The problem is that the search engine is indexing the text conatined within
comments in the above code.( i.e. though the HTML page does not display any text
within comments, the search returns a link to that page ).

The changes I have made to the "user.config" file are:
     IndexDir - set to the directory to be indexed.
     IndexFile - path where the Index file has to be placed.
     Index Name - set to the require index name.
     IndexComments 0
The rest of the variables have been set to the default values.

Apart from this, the "INDEXTAGS" variable (in the config.h) has been set to 0.

I have noticed that comments are being indexed if any HTML tag is present within
the comment tags. Whereas, the text within comment tags is not indexed if no
other tag is present within the comment tag. Can you please suggest a workaround
for this problem (without any change having to be done to the existing HTML
pages) ? Do I have to set any other variable in the config files ?
Also, the search engine does not recognize <SCRIPT> tags, thus any code enclosed
as javascript is also indexed. Can you suggest a means to avoid this ?

Thanks,
Madhumita.
Received on Tue Apr 3 01:19:27 2001