Skip to main content.
home | support | download

Back to List Archive

XML and swish-e 2.0.1

From: <jmruiz(at)not-real.boe.es>
Date: Tue Aug 29 2000 - 09:54:47 GMT
Hi all,

Just a few comments about XML support in swish-e.

The support for this type of documented is limited,
the only supported format is:

<fieldname> 
bla, bla, ...
</fieldname>

The way to support these feature is based on old the
metaname code extraction. In fact, it is almost identical
to:
<!--META NAME="fieldname" START>
bla, bla, ...
<!--META END>

I have also added the feature of nesting XML tags:
<fieldname1>
data1
<fieldname2> 
data2
</fieldname2>
data1
</fieldname1>

In the same way. It is also possible:
<!--META NAME="fieldname1" START>
data1
<!--META NAME="fieldname2" START>
data2
<!--META END>
data1
<!--META END>

With this approach, you may find data2 in both fieldname1 and 
fieldname2.

To be able to index this XML tags you have to specify the name
of the fields in the MetaNames directive of your config file. If you
have too many fieldnames you can also specify automatic in the
metanames field.
BTW, as Dave says, using automatic is not a good idea if your files 
are html because title, body, etc are extracted like fileds.

Anyway, this is more or less how  2.0.1 works with metanames.

As Rainer pointed in a previous message, a better
approach is using a new directive IndexContents to help the
index engine decide on what to do with the document.
This feature, combined with FileFilter, will make swish-e more
powerful. Eg (as taken from Rainer post):

IndexContents   HTML  .html .htm .shtml   .htm.  .html. .shtml.
IndexContents   XML   .xml
IndexContents   WAP   .wap .wml
IndexContents   TXT   .txt .txt.
IndexContents   TXT   .pdf .poc .dot .xls    

FileFilter      .doc  doc-filter.sh
FileFilter      .dot  doc-filter.sh
FileFilter      .pdf  pdf-filter.sh
FileFilter      .xls  xls-filter.sh

HTML,WAP,XML and TXT may have its own buitin parser function 
in swish-e.

cu
Jose
Received on Tue Aug 29 09:59:06 2000