Skip to main content.
home | support | download

Back to List Archive

Re: ExtractPath & metanames

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Mar 10 2004 - 21:23:33 GMT
On Wed, Mar 10, 2004 at 03:03:19PM -0600, Peter Ensch wrote:
> 1) I expected to capture 'this_word' from the path
>    /path/to/my/site/this_word/file.htm
>    but underscores are not included in the default
>    WordCharacters. The string was not captured or was
>    truncated.

You could use BuzzWords, but that would effect all metanames.

> 2) If stemming is turned on, this also affects what 
>    ExtractPath captures. I expected to capture 
>    'relnotes' from the path
>    /path/to/my/site/relnotes/file.htm
>    but got 'relnot' instead.

Another problem.  Would be nice to set the fuzzy mode on a per-metaname
basis.

> For my purposes it would have been better if ExtractPath
> stored literal text (not SWISH 'words'), but perhaps in 
> most cases this is what people want.

That happens at a higher up level -- as far as the indexing code knows
you are just adding a <meta name="links" content="this_word">.

It's just code -- so a hack is always possible.  In index.c look at
"index_path_parts" -- you could likely turn of stemming around the
indexstring() call.  The fuzzy_mode is in:

   sw->indexf->header.fuzzy_data->stemmer->fuzzy_mode

so save that locally, then set it to FUZZY_NONE, call indexstring() and
then reset.

That's just off the top of my head.  No promises that it works...

-- 
Bill Moseley
moseley@hank.org
Received on Wed Mar 10 13:23:33 2004