Skip to main content.
home | support | download

Back to List Archive

Re: stemmer.c and swish-2.1.x

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Oct 30 2000 - 20:06:46 GMT
This is the kind of boring talk that results in a increase in the level of
unsubscribe messages to Roy.  Sorry!

At 11:40 AM 10/30/00 -0800, jmruiz@boe.es wrote:
>Well, it is a memory and portabilty issue. Why do you need to 
>allocate MAXWORDLEN to stem a word? It is not neccesary.
>If you avoid any reference to MAXWORDLEN, you will get a more
>portable code that can be use outside swish-e.

I'll have to look at the code, I guess.

So in other words, you pass in a reference to the word, and a reference to
its max length.  Then if Stem() needs more space than available it will
malloc() a bigger chunk and return that (plus the new max length).  And I
assume Stem() would also call free() on the original memory.

Am I getting it?

If I am getting it, then one thing I'd be concerned with if something in
the calling code had a reference to the original word (it's address, of
course) before Stem() reallocated it.

Which is one reason why stemmed = Stem( word ) might be better way to go.
That is, allocate a temporary variable on the stack in Stem() to use to
store the new stemmed word, and then malloc() a new variable right before
returning and copy the temp variable to this new memory and return the
pointer to the stemmed word.

Of course, as we discussed some time back, I tried this in my Stemmer perl
module, but ended up with a memory leak as I'm not good enough at Perl's xs
to know how to free memory......

Or the other way is to simply force that a stemmed word is never longer
than the original word.



Bill Moseley
mailto:moseley@hank.org
Received on Mon Oct 30 20:09:39 2000