Skip to main content.
home | support | download

Back to List Archive

Problem with IndexContents setting

From: Zeljan Silje <zeljan(at)not-real.laus.hr>
Date: Thu Oct 17 2002 - 12:00:50 GMT
Hi,

I am using swish2.2.1 on RedHat linux 6.1 (kernel 2.2.12-20), and here is
the output of querying rpm:

swish@nera$ rpm -qa| grep xml
libxml10-1.0.0-2
libxml-1.8.6-2
libxml-devel-1.8.6-2

If I put following line im my config file:
IndexContents HTML2 .htm .html .shtml

indexing goes fine on regular files (.html .htm .doc .pdf), but if swish try
to index x.html file which is not html file (in my case it is really gif
picture - named by mistake), swish just sits forever on that file. The same
thing happen if html file is very bad formed (something after </body> tag).

Changing config file to:
IndexContents HTML .htm .html .shtml

helps. Then swish works OK.

Is there some seting to instruct swish to disregard "bad" file? Any other
solution for using libxml2 parser in such case?

TIA
Received on Thu Oct 17 12:05:09 2002