Skip to main content.
home | support | download

Back to List Archive

Re: MetaName search not working, yet

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Tue Jan 29 2002 - 21:17:42 GMT
At 12:51 PM 01/29/02 -0800, you wrote:
>I just installed libxml2 (version 2.4.13) ... Are you saying that
><fake_meta>...</fake_meta> will or wont work with libxml2?

Will ONLY work with libxml2.


>I just tried again (with the same set of data and conf as mentioned earlier)
>using libxml2 and MetaNames do not seem to be catching either, just as
>before.

>Any ideas?

Sure.  Start slow.  In the libxml2 package there's a program called testHTML.  You can run  ./testHTML --sax test.html  and see how it's parsing your file.

Then when indexing use -T indexed_words parsed_tags

to see exactly what's happening with swish.

I just now uploaded to cvs an updated version of parser.c with slightly better debugging messages for -T parsed_tags.  You can grab it from sourceforge if you are not using CVS.  (Or not bother.)


My suggestion would be to index html files just to make sure.  But, here's your sample:

Remember, you have to tell swish to index the file as HTML2

DefaultContents HTML2 will work.


> cat 5.html
// <title>Guns and Butter</title>

globalPackage.description = '<meta_description>Some indexable words like
supply and demand, guns and butter.</meta_description>';
globalPackage.author = '<meta_author>Gordon Jessop</meta_author>';
globalPackage.foo = '1';
globalPackage.bar = 'checked';
globalPackage.blah = '123456';

> ./swish-e -c c -i 5.html -T parsed_tags  -v 0
Indexing Data Source: "File-System"
<meta_description> (meta [meta_description])
</meta_description> (meta)
<meta_author> (meta [meta_author])
</meta_author> (meta)
Indexing done!

Or another look:

> ./swish-e -c c -i 5.html -T indexed_words  -v 0           
Indexing Data Source: "File-System"
    Adding:[1:swishdefault(1)]   'guns'   Pos:3  Stuct:0xB ( BODY TITLE FILE )
    Adding:[1:swishdefault(1)]   'and'   Pos:4  Stuct:0xB ( BODY TITLE FILE )
    Adding:[1:swishdefault(1)]   'butter'   Pos:5  Stuct:0xB ( BODY TITLE FILE )
    Adding:[1:swishdefault(1)]   'globalpackage'   Pos:8  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   'description'   Pos:9  Stuct:0x9 ( BODY FILE )
    Adding:[1:meta_description(17)]   'some'   Pos:10  Stuct:0x89 ( META BODY FILE )
    Adding:[1:meta_description(17)]   'indexable'   Pos:11  Stuct:0x89 ( META BODY FILE )
    Adding:[1:meta_description(17)]   'words'   Pos:12  Stuct:0x89 ( META BODY FILE )
    Adding:[1:meta_description(17)]   'like'   Pos:13  Stuct:0x89 ( META BODY FILE )
    Adding:[1:meta_description(17)]   'supply'   Pos:14  Stuct:0x89 ( META BODY FILE )
    Adding:[1:meta_description(17)]   'and'   Pos:15  Stuct:0x89 ( META BODY FILE )
    Adding:[1:meta_description(17)]   'demand'   Pos:16  Stuct:0x89 ( META BODY FILE )
    Adding:[1:meta_description(17)]   'guns'   Pos:17  Stuct:0x89 ( META BODY FILE )
    Adding:[1:meta_description(17)]   'and'   Pos:18  Stuct:0x89 ( META BODY FILE )
    Adding:[1:meta_description(17)]   'butter'   Pos:19  Stuct:0x89 ( META BODY FILE )
    Adding:[1:swishdefault(1)]   'globalpackage'   Pos:22  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   'author'   Pos:23  Stuct:0x9 ( BODY FILE )
    Adding:[1:meta_author(18)]   'gordon'   Pos:24  Stuct:0x89 ( META BODY FILE )
    Adding:[1:meta_author(18)]   'jessop'   Pos:25  Stuct:0x89 ( META BODY FILE )
    Adding:[1:swishdefault(1)]   'globalpackage'   Pos:27  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   'foo'   Pos:28  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   '1'   Pos:29  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   'globalpackage'   Pos:30  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   'bar'   Pos:31  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   'checked'   Pos:32  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   'globalpackage'   Pos:33  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   'blah'   Pos:34  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   '123456'   Pos:35  Stuct:0x9 ( BODY FILE )
Indexing done!



-- 
Bill Moseley
mailto:moseley@hank.org
Received on Tue Jan 29 21:18:36 2002