Skip to main content.
home | support | download

Back to List Archive

Re: parsing question

From: Peter Karman <karman(at)not-real.cray.com>
Date: Fri Oct 03 2003 - 20:55:00 GMT
For the archive-

libxml2 did solve my problem. Thanks!

pek

Bill Moseley wrote:

> On Thu, Oct 02, 2003 at 03:04:10PM -0700, Peter Karman wrote:
> 
>>I have an HTML document that contains this markup:
>>
>>
>><tt CLASS="literal">
>>-h
>>  [
>><span CLASS="optional">
>>no
>></span>
>>]
>>aggress
>></tt>
>>
>>
>>And I would like a search for the following phrase to find that doc:
>>
>>"-h [no]aggress"
> 
> 
> I guess this is a bug.
> 
> moseley@bumby:~$ cat t.xml
> <xml>
> <tt CLASS="literal">
> -h
>   [
> <span CLASS="optional">
> no
> </span>
> ]
> aggress
> </tt>
> </xml>
> 
> Here's with the XML parser:
> 
> moseley@bumby:~$ swish-e -c c -i t.xml -T indexed_words -v0
>     Adding:[1:swishdefault(1)]   'h'   Pos:3  Stuct:0x1 ( FILE )
>     Adding:[1:swishdefault(1)]   'no'   Pos:5  Stuct:0x1 ( FILE )
>     Adding:[1:swishdefault(1)]   'aggress'   Pos:7  Stuct:0x1 ( FILE )
> 
> The thing to note is that the word position got bumped due to the tag.
> 
> If I use the XML2 parser I get:
> 
> moseley@bumby:~$ swish-e -c c -i t.xml -T indexed_words -v0
>     Adding:[1:swishdefault(1)]   'h'   Pos:7  Stuct:0x1 ( FILE )
>     Adding:[1:swishdefault(1)]   'no'   Pos:8  Stuct:0x1 ( FILE )
>     Adding:[1:swishdefault(1)]   'aggress'   Pos:9  Stuct:0x1 ( FILE )
> 
> moseley@bumby:~$ swish-e -w '"-h [no] aggress"' -H0
> 1000 t.xml "t.xml" 93
> 
> Can you use libxml2?
> 

-- 
Peter Karman - Software Publications Programmer - Cray Inc
phone: 651-605-9009 - mailto:karman@cray.com
Received on Fri Oct 3 20:55:09 2003