On Wed, Sep 03, 2003 at 08:11:45AM -0700, John P. Rouillard wrote:
> Hmm, 10 is swishtitle. Wierd. I wonder why its not showing up under
> swishdefault since swishtitle should be in swishdefault should mirror
> each other right?
Yes, title words get indexed as swishdefault metaname with the TITLE
flag set:
moseley@bumby:~$ cat 1.html
<html>
<head><title>Title</title>
</head>
<body>
body
</body>
</html>
moseley@bumby:~$ cat c
metanames swishtitle
moseley@bumby:~$ swish-e -i 1.html -c c -T indexed_words -v0
Adding:[1:swishdefault(1)] 'title' Pos:2 Stuct:0x7 ( HEAD TITLE FILE )
Adding:[1:swishtitle(10)] 'title' Pos:2 Stuct:0x7 ( HEAD TITLE FILE )
Adding:[1:swishdefault(1)] 'body' Pos:5 Stuct:0x9 ( BODY FILE )
> What is wierd is that I am seeing this on two other indexes as
> well. In one case its indexed under metaname id 11 that is also the
> swishtitle. This is wierd. I am spidering for the other two indexes,
> and the hypermail program is producing valid HTML, but its not being
> indexed under swishdefault.
The trick is to find the single document where it's not working and then
index that by itself and narrow down the document and the config until
you see where it's breaking.
> I have tried:
>
> % /tools/swish_e-2.4.0_pr1/share/doc/swish-e/examples/prog-bin/\
> index_hypermail.pl /data/www/mailing-lists/admin/0016.html > test.html
>
> % /tools/swish_e-2.4.0_pr1/bin/swish-e -i test.html -T indexed_words
But you are not using your config file there. If you specify the config
file does it still work?
>
> Indexing Data Source: "File-System"
> Indexing "test.html"
> ...
> Adding:[1:swishdefault(1)] 'guest' Pos:172 Stuct:0x9 ( BODY FILE )
> Adding:[1:swishdefault(1)] 'guest' Pos:206 Stuct:0x9 ( BODY FILE )
> Adding:[1:swishdefault(1)] 'guest' Pos:235 Stuct:0x9 ( BODY FILE )
> Adding:[1:swishdefault(1)] 'guest' Pos:245 Stuct:0x9 ( BODY FILE )
> Which shows that guest is swishdefault.
>
> % /tools/swish_e-2.4.0_pr1/bin/swish-e -w guest
> # SWISH format: 2.4.0-pr1
> # Search words: guest
> # Removed stopwords:
> # Number of hits: 1
> # Search time: 0.001 seconds
> # Run time: 0.023 seconds
> 1000 test.html "TWiki security setup." 3085
>
> So the simple test case works. Doing a guest search on the entire
> directory tree returns no hits, my config file is:
So if you index the document by itself it works but as part of the full
indexing it doesn't?
I'd do the brute force method of just writing all of
index_hypermail.pl's output to a file, index that and confirm that it
doesn't work. Then just divide up that large file until I find the
reason why it's not working.
--
Bill Moseley
moseley@hank.org
Received on Wed Sep 3 18:33:27 2003