Skip to main content.
home | support | download

Back to List Archive

Re: ExtractPath & metanames

From: Peter Ensch <pensch(at)not-real.ti.com>
Date: Wed Mar 10 2004 - 21:04:28 GMT
On Wed, Mar 10, 2004 at 11:08:54AM -0800, Bill Moseley wrote:
> On Wed, Mar 10, 2004 at 12:34:23PM -0600, Peter Ensch wrote:
> 
> > I am concerned w/ the output however. The word names appear
> > truncated. I guess they are printf'ed for formatting purposes,
> > but this does limit their usefulness. Is there a way to 
> > prevent this w/out hacking on the source code?
> 
> I don't see that behavior.  The code has:
> 
>   printf("\n%s",resultword);
> 
> So that doesn't limit the length.
> 

Of course you are correct. 

I did discover the problem. Words captured as a result of an
ExtractPath regex have the same rules applied as any other 
indexed word.

In my case there were 2 unexpected factors at play:

1) I expected to capture 'this_word' from the path
   /path/to/my/site/this_word/file.htm
   but underscores are not included in the default
   WordCharacters. The string was not captured or was
   truncated.

2) If stemming is turned on, this also affects what 
   ExtractPath captures. I expected to capture 
   'relnotes' from the path
   /path/to/my/site/relnotes/file.htm
   but got 'relnot' instead.

For my purposes it would have been better if ExtractPath
stored literal text (not SWISH 'words'), but perhaps in 
most cases this is what people want.

Regards,
Peter

-- 

^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^
Peter Ensch,
pensch@ti.com           A-1140   (214) 480 2333
^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^
Received on Wed Mar 10 13:04:28 2004