I am using swish2.2.1 on RedHat linux 6.1 (kernel 2.2.12-20), and here is
the output of querying rpm:
swish@nera$ rpm -qa| grep xml
If I put following line im my config file:
IndexContents HTML2 .htm .html .shtml
indexing goes fine on regular files (.html .htm .doc .pdf), but if swish try
to index x.html file which is not html file (in my case it is really gif
picture - named by mistake), swish just sits forever on that file. The same
thing happen if html file is very bad formed (something after </body> tag).
Changing config file to:
IndexContents HTML .htm .html .shtml
helps. Then swish works OK.
Is there some seting to instruct swish to disregard "bad" file? Any other
solution for using libxml2 parser in such case?
Received on Thu Oct 17 12:05:09 2002