Skip to main content.
home | support | download

Back to List Archive

Re: stemmer.c and swish-2.1.x

From: <jmruiz(at)not-real.boe.es>
Date: Mon Oct 30 2000 - 19:40:52 GMT
Hi Bill,

> 
> Humm.  I'm not sure I understand the issue.  Maybe there's two issues.
> 
> Here's how it used to look:
> 
> int Stem_it( word, wordlen )
>    char *word;  /* in/out: the word stemmed */
>    int wordlen; /* in: length of word, to avoid strcat overflow */ {
>    int rule;    /* which rule is fired in replacing an end */
> 
>    /*  Hack to make sure Stem() doesn't stem the word into
>    nonexistence */ char saveword[MAXWORDLEN]; if ( wordlen !=
>    MAXWORDLEN ) return( TRUE ); 
>                               /* semi-graceful abort - SRE - 2/00 */
>    strcpy( saveword, word );
> 
> First, I don't understand why wordlen needed to be passed in in the
> first place.  In search.c it just calls the the stemmer like this:
> 
>      Stem(word, MAXWORDLEN);
> 

Well, it is a memory and portabilty issue. Why do you need to 
allocate MAXWORDLEN to stem a word? It is not neccesary.
If you avoid any reference to MAXWORDLEN, you will get a more
portable code that can be use outside swish-e.

> 
> Now, I'm not clear on the change you are talking about now.  Is it to
> protect against a sistuation where a stemmed word requires more memory
> than the nonstemmed word?
> 
Right.

> Seems like Stem() would be better as a function, where the passed
> parameter isn't modified:
>        stemmed_word = Stem( word );
> 

This is also possible. But the old Stem function retuns TRUE or 
FALSE (although this value is simply ignored by swish-e).

cu
Jose
Received on Mon Oct 30 19:45:13 2000