Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] hierarchical metanames

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Fri Oct 05 2007 - 18:50:13 GMT
On 10/05/2007 01:37 PM, Ravi Murthy wrote:
> Compounding the queries doesn't quite give the results that I want - because 
> it combines the results at the document level but not a specific node. 
> Consider the document below.
> 
> <root>
>    <a>
>      <b>bar</b>
>    </a>
>    <a>
>      <c>foo</c>
>    </a>
>    <b>foo</b>
> </root>
> 
> a.b = (foo)  -- SHOULD BE FALSE
> 
> but
> 
> a = (foo) AND b = (foo) -- WILL RETURN TRUE
> 

swish-e's parser "flattens" the DOM at indexing time. In fact, swish-e's parser
knows nothing about the DOM at all. It uses SAX. So no, to answer your original
question, Swish-e doesn't have that kind of feature built-in.

However, I might accomplish the same effect by pre-parsing the XML and feeding
to the -S prog option. Then you could mimic the hierarchy with the tag names
themselves. That has some trade-offs, since you could't search for just 'a =
foo' for example, unless your pre-parsing flattened the tags in all the
combinations that you'd want to be able to search for later.

[pek@dewpoint:~/tmp]$ cat conf
UndefinedMetaTags auto
DefaultContents XML

[pek@dewpoint:~/tmp]$ cat nest.xml
<root>
   <a.b>bar</a.b>
   <a.c>foo</a.c>
   <b>foo</b>
</root>

[pek@dewpoint:~/tmp]$ swish-e -w a.b = foo
# SWISH format: 2.5.6
# Search words: a.b = foo
# Removed stopwords:
err: no results

[pek@dewpoint:~/tmp]$ swish-e -w a.c = foo
# SWISH format: 2.5.6
# Search words: a.c = foo
# Removed stopwords:
# Number of hits: 1
# Search time: 0.000 seconds
# Run time: 0.007 seconds
1000 nest.xml "nest.xml" 66

-- 
Peter Karman  .  peter(at)not-real.peknet.com  .  http://peknet.com/

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Oct 5 14:50:15 2007