Skip to main content.
home | support | download

Back to List Archive

Re: stemmer.c and swish-2.1.x

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Oct 30 2000 - 19:22:55 GMT
At 10:46 AM 10/30/00 -0800, jmruiz@boe.es wrote:
>I forgot to mention one thing in my last post about 2.1.x.
>
>Finally, I have updated stemmer.c. Now it is thread safe
>and it also is independent of the string length or any other
>buffer length assumption.
>
>I have made a small modification of the function. Now the call
>changes how the paramater is passed. Now it looks like:
>
>int Stem( word, lenword ) /* redefined - Moseley 10/17/99 */
>   char **word;  /* in/out: the word stemmed */
>   int *lenword;  /* in/out: the length of word stemmed */
>
>So if more memory is neded, word buffer would be reallocated.
>
>Eg:
>
>char *myword;
>int mywordlen;
>
>myword=emalloc(6);
>strcpy(myword,"hello");
>Stem(&myword,&mywordlen);
>
>So, if the stemmed word needs 7 bytes, the Stem function will 
>reallocate the buffer.

Humm.  I'm not sure I understand the issue.  Maybe there's two issues.

Here's how it used to look:

int Stem_it( word, wordlen )
   char *word;  /* in/out: the word stemmed */
   int wordlen; /* in: length of word, to avoid strcat overflow */
   {
   int rule;    /* which rule is fired in replacing an end */

   /*  Hack to make sure Stem() doesn't stem the word into nonexistence */
   char saveword[MAXWORDLEN];
   if ( wordlen != MAXWORDLEN ) return( TRUE ); 
                              /* semi-graceful abort - SRE - 2/00 */
   strcpy( saveword, word );

First, I don't understand why wordlen needed to be passed in in the first
place.  In search.c it just calls the the stemmer like this:

     Stem(word, MAXWORDLEN);

So I'm missing the point of passing MAXWORDLEN just to check that it still
is the same value after the call.  Seems like that's saying:

     if ( 2 != 2 ) { printf("we have a problem"); }

Now, I'm not clear on the change you are talking about now.  Is it to
protect against a sistuation where a stemmed word requires more memory than
the nonstemmed word?

Seems like Stem() would be better as a function, where the passed parameter
isn't modified:
       stemmed_word = Stem( word );

But I'm no C programmer.



Bill Moseley
mailto:moseley@hank.org
Received on Mon Oct 30 19:25:43 2000