On Fri, Apr 02, 2004 at 05:25:35AM -0800, Peter Karman wrote:
> The swish docs say (under MetaNames):
> ---
> When indexing HTML Swish-e indexes the HTML title as default text, so
> when searching Swish-e will find matches in both the HTML body and the
> HTML title. Swish also, by default, indexes content of meta tags. So:
>
> swish-e -w foo
>
> will find ``foo'' in the body, the title, or any meta tags.
or any meta tags that are not defined as MetaNames.
That's not very clear, so here I go....
Everything is stored as a metaname in the index. Metanames are
basically a way to have multiple indexes in the same index file.
By default, swish indexes text under the metaname "swishdefault". These
both search "swishdefault"
-w foo
-w swishdefault=(foo)
Now, by default (with no special config options) swish indexes HTML
<title>, <meta> and text extracted from <body> as swishdefault. But, if
a meta tag is defined it is indexed under a different meta id, and
cannot be searched with just -w foo:
(Notice a bit broken HTML here, but it's still indexed.)
moseley@laptop:~$ cat test.html
<html>
<head>
<title>Title</title>
<meta name="meta1" content="meta1text">
<meta name="meta2" content="meta2text">
brokentext
<body>
bodytext
</body>
</html>
moseley@laptop:~$ cat c
ParserWarnLevel 9
MetaNames meta1
moseley@laptop:~$ swish-e -v0 -i test.html -c c -T indexed_words
Adding:[1:swishdefault(1)] 'title' Pos:2 Stuct:0x7 ( HEAD TITLE FILE )
Adding:[1:meta1(10)] 'meta1text' Pos:5 Stuct:0x85 ( META HEAD FILE )
Adding:[1:swishdefault(1)] 'meta2text' Pos:9 Stuct:0x5 ( HEAD FILE )
test.html:7: error: htmlParseStartTag: misplaced <body> tag
<body>
^
Adding:[1:swishdefault(1)] 'brokentext' Pos:11 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'bodytext' Pos:12 Stuct:0x9 ( BODY FILE )
Swish can't (or doesn't) search all metanames because of the of phrase
searches would not work right. So, to search multiple metanames you
can do:
-w swishdefault=(foo) OR meta=(foo)
The other option is to next the metanames:
<name>
<first>Bill</first>
<last>Moseley</last>
<name>
then with Metanames name first last you get:
moseley@laptop:~$ swish-e -v0 -i x -c c -T indexed_words
Adding:[1:name(10)] 'bill' Pos:4 Stuct:0x89 ( META BODY FILE )
Adding:[1:first(11)] 'bill' Pos:4 Stuct:0x89 ( META BODY FILE )
Adding:[1:name(10)] 'moseley' Pos:7 Stuct:0x89 ( META BODY FILE )
Adding:[1:last(12)] 'moseley' Pos:7 Stuct:0x89 ( META BODY FILE )
and then you can search like:
-w first=(bill)
-w name=("bill moseley")
But, that does basically double the size of the index.
--
Bill Moseley
moseley@hank.org
Received on Fri Apr 2 07:34:21 2004