Skip to main content.
home | support | download

Back to List Archive

Re: meta names not included in swishdefault?

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Tue Feb 26 2002 - 19:33:51 GMT
At 11:24 AM 02/26/02 -0800, Fred Toth wrote:
>Regarding nested meta names: Is it reasonable to use "html" as
>the top level of nested meta names (using HTML2 as the IndexContents value)?

I'm not sure how reasonable it is ;)

Best to test:

> cat 1.html
<html>
<head>
<title>Titletext</title>
<group>
<meta name="meta1" content="meta1text">
<meta name="meta2" content="meta2text">
<meta name="meta3" content="meta3text">
</group>

</head>
<body>
Bodytext
</body>
</html>


> cat c
defaultcontents HTML2

metanames html meta1 meta2 meta3 group


> ./swish-e -c c -i 1.html -T indexed_words -v0
Indexing Data Source: "File-System"
    Adding:[1:html(10)]   'titletext'   Pos:2  Stuct:0x87 ( META HEAD TITLE FILE )
    Adding:[1:html(10)]   'meta1text'   Pos:6  Stuct:0x85 ( META HEAD FILE )
    Adding:[1:meta1(11)]   'meta1text'   Pos:6  Stuct:0x85 ( META HEAD FILE )
    Adding:[1:group(14)]   'meta1text'   Pos:6  Stuct:0x85 ( META HEAD FILE )
    Adding:[1:html(10)]   'meta2text'   Pos:9  Stuct:0x85 ( META HEAD FILE )
    Adding:[1:meta2(12)]   'meta2text'   Pos:9  Stuct:0x85 ( META HEAD FILE )
    Adding:[1:group(14)]   'meta2text'   Pos:9  Stuct:0x85 ( META HEAD FILE )
    Adding:[1:html(10)]   'meta3text'   Pos:12  Stuct:0x85 ( META HEAD FILE )
    Adding:[1:meta3(13)]   'meta3text'   Pos:12  Stuct:0x85 ( META HEAD FILE )
    Adding:[1:group(14)]   'meta3text'   Pos:12  Stuct:0x85 ( META HEAD FILE )
    Adding:[1:html(10)]   'bodytext'   Pos:16  Stuct:0x89 ( META BODY FILE )
Indexing done!

So you can see how the words are nested.  Everything has the "META" structure flag set since everything in within <html>.

>Then, the "all" search would be:
>
>         swish -w html=smith

Yep.

Note that this will only work with libxml2 linked in.

This is only possible in the last month or so, because I relaxed the requirements for metanames -- before metanames could only be non-html tags.


-- 
Bill Moseley
mailto:moseley@hank.org
Received on Tue Feb 26 19:34:28 2002