Skip to main content.
home | support | download

Back to List Archive

Re: XML attributes in XML element content

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Feb 19 2004 - 15:14:14 GMT
On Wed, Feb 18, 2004 at 10:02:12PM -0800, Dave Moreau wrote:
> Basically, I want to be able to return properties that appear in the middle 
> of the document without them being included in the containing context. For 
> the file

You can, to some degree.

$ cat c
DefaultContents XML2
MetaNames name first last
PropertyNames name first last

$ cat 1.xml
<xml>
<name>
<first>Bill</first>
<last>Moseley</last>
</name>
</xml>

$ swish-e -w name=bill -p last -H0
1000 1.xml "1.xml" 69 "Moseley"

$ swish-e -w name=bill -p name -H0
1000 1.xml "1.xml" 69 "Bill Moseley"

Here's using the XMLClassAttributes

$ cat c
DefaultContents XML2
MetaNames name.first name.last
PropertyNames name.first name.last
XMLClassAttributes class

$ cat 1.xml
<xml>
<name class="first">Bill</name>
<name class="last">Moseley</name>
</xml>

$ swish-e -c c -i 1.xml -v 0 -T indexed_words properties
    Adding:[1:name.first(10)]   'bill'   Pos:4  Stuct:0x1 ( FILE )
    Adding:[1:name.last(11)]   'moseley'   Pos:8  Stuct:0x1 ( FILE )
          swishdocpath: 6 (  5) S: "1.xml"
          swishdocsize: 8 (  4) N: "79"
     swishlastmodified: 9 (  4) D: "2004-02-19 06:43:20 PST"
            name.first:12 (  4) S: "Bill"
             name.last:13 (  7) S: "Moseley"

>   <elem attrib="blah">nonsense</elem>
> 
> where elem is defined
> 
>   MetaNames elem
> 
> and attrib is defined
> 
>   PropertyNames elem.attrib
> 
> And 'elem' is an alias for 'swishdefault', I do not want 'blah' to be 
> indexed as appearing in elem. Searching the index with '-w blah' should not 
> return this document, but '-p elem.attrib' should return the property.

I'm not sure I'm following, but in that case the above example is not 
what you are looking for.  You may not be able to do this with swish-e's
current config options because swish-e assumes that metanames are nested
(which allows the first example where you can search the more general
metaname "name" or the more specific "name.last" or "name.first".

UndefinedXMLAttributes doesn't really provide all the control you need
-- and also there really needs to be a separate way to ignore metanames
and property names.

Still, you can do:

$ cat c
DefaultContents XML2
PropertyNames elem.attrib
UndefinedXMLAttributes ignore

$ cat 1.xml
<xml>
<elem attrib="blah">nonsense</elem>
</xml>

$ swish-e -c c -i 1.xml -v 0 -T indexed_words properties
    Adding:[1:swishdefault(1)]   'nonsense'   Pos:6  Stuct:0x1 ( FILE )
          swishdocpath: 6 (  5) S: "1.xml"
          swishdocsize: 8 (  4) N: "49"
     swishlastmodified: 9 (  4) D: "2004-02-19 06:58:52 PST"
           elem.attrib:10 (  4) S: "blah"

Which kind of does what you are asking, I think:

$ swish-e -w blah -H0
(no results)
$ swish-e -w not blah -p elem.attrib -H0
1000 1.xml "1.xml" 49 "blah"

> I find it interesting that the property appears in it's context, yet the 
> context disappears when swish-e returns the property. For example:
> 
>   <elem attrib="blah">nonsense</elem> lksjfs ds dsfs df  <elem 
> attrib="lalala">more </elem>
> 
> My experience is that using '-p elem.attrib' returns a space delimited 
> string of all occurances of elem.attrib (thus "blah lalala" would be 
> returned). I like this behavior. I wish I could specify:
> 
>   PropertyNames   elem.attrib
> 
> when I want it indexed and
> 
>   PropertyNamesNoIndex   elem.attrib
> 
> when I don't.

There's IgnoreMetaTags but I think it works for both MetaNames and
Properties.  It would be nice to have a config option for each.

Again, I think there's so many ways to want to do this that creating
config options in swish to handle it would be hard if not confusing in
use.  Using an external parser seems like an easier and faster way to
go.

-- 
Bill Moseley
moseley@hank.org
Received on Thu Feb 19 07:14:17 2004