Hi all,
Just a few comments about XML support in swish-e.
The support for this type of documented is limited,
the only supported format is:
<fieldname>
bla, bla, ...
</fieldname>
The way to support these feature is based on old the
metaname code extraction. In fact, it is almost identical
to:
<!--META NAME="fieldname" START>
bla, bla, ...
<!--META END>
I have also added the feature of nesting XML tags:
<fieldname1>
data1
<fieldname2>
data2
</fieldname2>
data1
</fieldname1>
In the same way. It is also possible:
<!--META NAME="fieldname1" START>
data1
<!--META NAME="fieldname2" START>
data2
<!--META END>
data1
<!--META END>
With this approach, you may find data2 in both fieldname1 and
fieldname2.
To be able to index this XML tags you have to specify the name
of the fields in the MetaNames directive of your config file. If you
have too many fieldnames you can also specify automatic in the
metanames field.
BTW, as Dave says, using automatic is not a good idea if your files
are html because title, body, etc are extracted like fileds.
Anyway, this is more or less how 2.0.1 works with metanames.
As Rainer pointed in a previous message, a better
approach is using a new directive IndexContents to help the
index engine decide on what to do with the document.
This feature, combined with FileFilter, will make swish-e more
powerful. Eg (as taken from Rainer post):
IndexContents HTML .html .htm .shtml .htm. .html. .shtml.
IndexContents XML .xml
IndexContents WAP .wap .wml
IndexContents TXT .txt .txt.
IndexContents TXT .pdf .poc .dot .xls
FileFilter .doc doc-filter.sh
FileFilter .dot doc-filter.sh
FileFilter .pdf pdf-filter.sh
FileFilter .xls xls-filter.sh
HTML,WAP,XML and TXT may have its own buitin parser function
in swish-e.
cu
Jose
Received on Tue Aug 29 09:59:06 2000