Skip to main content.
home | support | download

Back to List Archive

Re: New versions of swish-e 2.x

From: Bill Moseley <moseley(at)>
Date: Wed Nov 15 2000 - 19:28:44 GMT
At 11:01 AM 11/15/00 +0100, wrote:
>>  i=0;
>>  if(!lenword) word = (char *)emalloc((lenword=MAXWORDLEN) +1);
>> What's the reason for the if ( !lenword )?  Isn't !lenword always true
>> there?
>lenword is 0 the first time the function is executed. It is a static var, 
>so, its value is preserved between calls.

See, I told you my C was poor.  I didn't see the "static" there.  So
basically, you are creating a place to hold the word, but it can grow if
needed (which is unlikely).

>Right, for me MAXWORDLEN is not really the max length of the 
>word, it is the size of the initial buffer to store it. If it is big enough 
>(eg 1000), you will not need to reallocate the size of buffer (faster). 
>Anyway, you do not have to worry about the max length of the word, 
>if it is 1001, one extra byte will be allocated. I have tried to make the 
>code more secure to avoid buffer overruns. Imagine that someone 
>calls your CGI script with a word of 1001 characters wich is greater 
>than MAXWORDLEN. Well, we can do two things to handle it:
>- Truncate the word up to MAXWORDLEN like swish-e 1.3.3 (1.3.2 + 
>- Reallocate the word like swish-e 2.x
>I rather prefer the second one.

Ok.  I'd hope that people check these things before passing off to Swish...

Anyway, you are correct as it's a matter of programming style.  I tend to
like automatic variables on the stack as I imagine that they are faster to
create (just move the stack pointer) and faster to access since the
compiler knows where they are, and no need to worry about malloc() and
such.  But that's just me.  

In either case you have to worry about buffer overruns.  And although the
stack is probably smaller than the heap, I suppose in both cases you have
to set some upper limit, too.  It also means people coding in the future
must be aware that they may find a word longer than MAXWORDLEN.  But it's
probably unlikely that there is such a long word.

>Well, I do not use stemmer.c, and you know it much better. So you 
>are probably right; in this module it has nonsense doing these 
>things with word's buffers. I can go back to original stemmer.c 
>schema (using MAXWORDLEN) to avoid calling the allocation 

Either way.  Most of my concern is because I'm having a hard time with the
XS code in my perl stemmer module trying to figure out how to create a
mortal perl variable and free()ing anything Stem() creates to avoid the
leak.  You don't need to program down to my level of understanding ;)

Bill Moseley
Received on Wed Nov 15 19:30:24 2000