Skip to main content.
home | support | download

Back to List Archive

Re: swish-e search difficulties

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Tue May 11 2004 - 19:52:06 GMT
On Tue, May 11, 2004 at 12:24:35PM -0700, Chris Kantarjiev wrote:
> I'm indexing a mail archive (one file per message) and searching
> with swish.cgi. (I'm running 2.4.1.) It was recently pointed
> out to me that "Subject & Body" searches don't find all the
> messages that "Subject" does - that is, if the keyword only
> appears in the subject field, which becomes swishtitle, it
> isn't found by Subject & Body. 

That's due to the way your program converts the mail messages.

See, when swish-e indexes a HTML document it indexes the <title> text
under the "swishdefault" meta name (and flags the words as being in the
title so they rank higher).

So a search like:

   swish-e -w foo

will find foo in the title as well as in the body.

>    metanames       => [qw/swishdefault swishtitle from all/],
>    name_labels     => {
>        swishrank   =>  'Rank',
>        all	   =>  'Entire message',
>        swishtitle  =>  'Subject Only',
>        from        =>  "Poster's Email",
>        date        =>  'Message Date',
>        swishdefault  =>  'Subject & Body',
>    },

And there you are saying search "swishdefault" for "Subject & Body."

But...

> <title>
> 
> </title>
> <meta name="precedence" content="list">
> <meta name="swishtitle" content="Girls Aloud's year at the top">
> <meta name="to" content="Name <your@name.here>">
> <meta name="sender" content="your@name.here">
> <meta name="date" content="1066685834">
> <meta name="from" content="Another Name <my@name.here>">
> <meta name="received" content="by wolfe.bbn.com (Postfix, from userid 13274)">
> </head><body>

There you say to index the title (Girls Aloud's year at the top) just
under the metaname swishtitle.

Using -T indexed_words will show you the difference and why it's not
working like you want.

Searching "-w foo" or "-w swishdefault=foo" doesn't also search
swishtitle, it only searches swishdefault.  It just happens that <title>
text gets indexed as swishdefault along with <body> text.

> <http://news.bbc.co.uk/1/low/entertainment/tv_and_radio/3207926.stm>
> <http://news.bbc.co.uk/1/low/england/3207822.stm>

You might want to HTML escape those so they don't look like tags -- that
is, if you want to index the words in those links.

-- 
Bill Moseley
moseley@hank.org
Received on Tue May 11 12:52:07 2004