Skip to main content.
home | support | download

Back to List Archive

Re: Problem with XMLClassAttribues

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Apr 19 2002 - 16:34:37 GMT
At 08:44 AM 04/19/02 -0700, Cristiano Corsani wrote:
><record>
>	<field type="01">
>		<subfield type="a">valueA</subfield>
>		<subfiled type="b">valueB</subfield>
>	</field>
>	<field type="02">
>		<subfield type="b">valueAA</subfield>
>		<subfiled type="c">valueBB</subfield>
>	</field>
>	<field type="03">
>		<subfield type="e">valueAAA</subfield>
>		<subfiled type="f">valueBBB</subfield>
>	</field>
></record>
>
>I want to index only type 01 field so I write this config file:
>
>..
>XMLClassAttributes field
                    ^^^^^

You want

XMLClassAttributes type

Here's a debugging tip:

> cat c
DefaultContents XML2
XMLClassAttributes type
UndefinedMetaTags auto

The "auto" will make everything a meta tag, which helps this display (try
without).


> ./swish-e -c c -i 1.xml -v0 -T parsed_tags 
Indexing Data Source: "File-System"
<record> (meta [record])
    <field> (meta [field])
        <field> (meta [field.01])
        <subfield> (meta [subfield])
            <subfield> (meta [subfield.a])
        </subfield> (meta)
        </subfield> (meta)
        <subfiled> (meta [subfiled])
            <subfiled> (meta [subfiled.b])
        </subfiled> (meta)
        </subfiled> (meta)
    </field> (meta)
    </field> (meta)
</record> (meta)
Indexing done!



>UndefinedMetaTags ignore

You can't do that because you don't list <record> so it's not undefined,
and then everything *nested* inside is ignored.

It's a good argument that if a metaname IS defined then it should start
processing again for that metaname.  But the original design was if some
metatag stopped processing, processing didn't start back up again until its
closing tag was found, regardless of what was in between.

So the question is how to index just some deeply nested tags?  Not sure you
can with out using -S prog and parsing the XML yourself.

>A second question: is it possible to tell swish to index only subfield "a"
>inside field "01" and subfield "b" inside field "02"? How can I write a
>correct config file?

That's the problem -- there's too many ways one might want to cut-up a file
for indexing.  Anything's possible, and the libxml2 parser makes things a
lot easier.  The above XML-specific settings were relatively easy hacks,
but they don't work for everything.

Using -S prog will be by far the fastest method and the most powerful for
you, if you are comfortable with parsing XML in your own program.


-- 
Bill Moseley
mailto:moseley@hank.org
Received on Fri Apr 19 16:36:03 2002