Skip to main content.
home | support | download

Back to List Archive

Re: parsing question

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Oct 03 2003 - 00:05:48 GMT
On Thu, Oct 02, 2003 at 03:04:10PM -0700, Peter Karman wrote:
> I have an HTML document that contains this markup:
> 
> 
> <tt CLASS="literal">
> -h
>   [
> <span CLASS="optional">
> no
> </span>
> ]
> aggress
> </tt>
> 
> 
> And I would like a search for the following phrase to find that doc:
> 
> "-h [no]aggress"

I guess this is a bug.

moseley@bumby:~$ cat t.xml
<xml>
<tt CLASS="literal">
-h
  [
<span CLASS="optional">
no
</span>
]
aggress
</tt>
</xml>

Here's with the XML parser:

moseley@bumby:~$ swish-e -c c -i t.xml -T indexed_words -v0
    Adding:[1:swishdefault(1)]   'h'   Pos:3  Stuct:0x1 ( FILE )
    Adding:[1:swishdefault(1)]   'no'   Pos:5  Stuct:0x1 ( FILE )
    Adding:[1:swishdefault(1)]   'aggress'   Pos:7  Stuct:0x1 ( FILE )

The thing to note is that the word position got bumped due to the tag.

If I use the XML2 parser I get:

moseley@bumby:~$ swish-e -c c -i t.xml -T indexed_words -v0
    Adding:[1:swishdefault(1)]   'h'   Pos:7  Stuct:0x1 ( FILE )
    Adding:[1:swishdefault(1)]   'no'   Pos:8  Stuct:0x1 ( FILE )
    Adding:[1:swishdefault(1)]   'aggress'   Pos:9  Stuct:0x1 ( FILE )

moseley@bumby:~$ swish-e -w '"-h [no] aggress"' -H0
1000 t.xml "t.xml" 93

Can you use libxml2?

-- 
Bill Moseley
moseley@hank.org
Received on Fri Oct 3 00:05:52 2003