At 08:44 AM 04/19/02 -0700, Cristiano Corsani wrote:
> <field type="01">
> <subfield type="a">valueA</subfield>
> <subfiled type="b">valueB</subfield>
> <field type="02">
> <subfield type="b">valueAA</subfield>
> <subfiled type="c">valueBB</subfield>
> <field type="03">
> <subfield type="e">valueAAA</subfield>
> <subfiled type="f">valueBBB</subfield>
>I want to index only type 01 field so I write this config file:
Here's a debugging tip:
> cat c
The "auto" will make everything a meta tag, which helps this display (try
> ./swish-e -c c -i 1.xml -v0 -T parsed_tags
Indexing Data Source: "File-System"
<record> (meta [record])
<field> (meta [field])
<field> (meta [field.01])
<subfield> (meta [subfield])
<subfield> (meta [subfield.a])
<subfiled> (meta [subfiled])
<subfiled> (meta [subfiled.b])
You can't do that because you don't list <record> so it's not undefined,
and then everything *nested* inside is ignored.
It's a good argument that if a metaname IS defined then it should start
processing again for that metaname. But the original design was if some
metatag stopped processing, processing didn't start back up again until its
closing tag was found, regardless of what was in between.
So the question is how to index just some deeply nested tags? Not sure you
can with out using -S prog and parsing the XML yourself.
>A second question: is it possible to tell swish to index only subfield "a"
>inside field "01" and subfield "b" inside field "02"? How can I write a
>correct config file?
That's the problem -- there's too many ways one might want to cut-up a file
for indexing. Anything's possible, and the libxml2 parser makes things a
lot easier. The above XML-specific settings were relatively easy hacks,
but they don't work for everything.
Using -S prog will be by far the fastest method and the most powerful for
you, if you are comfortable with parsing XML in your own program.
Received on Fri Apr 19 16:36:03 2002