Skip to main content.
home | support | download

Back to List Archive

SwishFuzzyWordError() and missing stemmer constants

From: Antony Dovgal <antony(at)not-real.zend.com>
Date: Mon Jan 29 2007 - 07:52:48 GMT
This is a multi-part message in MIME format.
--------------020605080205000801020406
Content-Type: text/plain; charset=KOI8-R; format=flowed
Content-Transfer-Encoding: 7bit

Hello all.

According to the documentation, SwishFuzzyWordError() return values are 
defined in src/stemmer.h file, and this is true. 
Though, this fact actually makes it impossible to use these values because 
stemmer.h is not a public header and used only internally.

Also, it's not really clear if one should use this function or it's not recommended/deprecated/etc.
The documentation of SwishFuzzyWordError() almost does not shed a light:

"Not all stemmers set this value correctly." - well, this means at least some of them 
DO return correct values. That's better than nothing.
Maybe it's time to fix those returning incorrect values?

"But since SwishFuzzyWordList() will return a valid string regardless of the return value, 
you can often just ignore this setting. That's what I do." - how often should I ignore it? =)
I mean, if the value of this function should be ignored, then the function itself is useless.

Hence the question: 
Would you accept a patch exporting those constants to public (and changing the 
function prototype appropriately) or should I forget about SwishFuzzyWordError()?
See diff against current CVS in attachment.

Thanks in advance.

-- 
Wbr, 
Antony Dovgal


--------------020605080205000801020406
Content-Type: text/plain;
 name="stemmer_constants.diff.txt"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="stemmer_constants.diff.txt"

? src/.deps
? src/.libs
? src/Makefile
? src/acconfig.h
? src/bash.lo
? src/check.lo
? src/compress.lo
? src/date_time.lo
? src/db_native.lo
? src/db_read.lo
? src/db_write.lo
? src/docprop.lo
? src/docprop_write.lo
? src/double_metaphone.lo
? src/entities.lo
? src/error.lo
? src/extprog.lo
? src/file.lo
? src/filter.lo
? src/fs.lo
? src/getruntime.lo
? src/hash.lo
? src/headers.lo
? src/html.lo
? src/http.lo
? src/httpserver.lo
? src/index.lo
? src/libswish-e.la
? src/libswishindex.la
? src/list.lo
? src/mem.lo
? src/merge.lo
? src/metanames.lo
? src/methods.lo
? src/parse_conffile.lo
? src/parser.lo
? src/pre_sort.lo
? src/proplimit.lo
? src/ramdisk.lo
? src/rank.lo
? src/result_sort.lo
? src/search.lo
? src/soundex.lo
? src/stamp-h1
? src/stemmer.lo
? src/swish-e
? src/swish2.lo
? src/swish_qsort.lo
? src/swish_words.lo
? src/swregex.lo
? src/swstring.lo
? src/txt.lo
? src/xml.lo
? src/expat/.deps
? src/expat/.libs
? src/expat/Makefile
? src/expat/libswexpat.la
? src/expat/xmlparse.lo
? src/expat/xmlrole.lo
? src/expat/xmltok.lo
? src/replace/.deps
? src/replace/.libs
? src/replace/Makefile
? src/replace/dummy.lo
? src/replace/libreplace.la
? src/snowball/.deps
? src/snowball/.libs
? src/snowball/Makefile
? src/snowball/api.lo
? src/snowball/libsnowball.la
? src/snowball/stem_de.lo
? src/snowball/stem_dk.lo
? src/snowball/stem_en1.lo
? src/snowball/stem_en2.lo
? src/snowball/stem_es.lo
? src/snowball/stem_fi.lo
? src/snowball/stem_fr.lo
? src/snowball/stem_it.lo
? src/snowball/stem_nl.lo
? src/snowball/stem_no.lo
? src/snowball/stem_pt.lo
? src/snowball/stem_ru.lo
? src/snowball/stem_se.lo
? src/snowball/utilities.lo
Index: src/libtest.c
===================================================================
RCS file: /cvsroot/swishe/swish-e/src/libtest.c,v
retrieving revision 1.16
diff -u -p -d -r1.16 libtest.c
--- src/libtest.c	12 May 2005 15:41:05 -0000	1.16
+++ src/libtest.c	28 Jan 2007 22:24:53 -0000
@@ -523,7 +523,7 @@ static void stem_it( SW_RESULT r, char *
     printf(" [%s] : ", word );
     
     fw = SwishFuzzyWord( r, word );
-    printf(" Status: %d", SwishFuzzyWordError(fw) );
+    printf(" Status: %d", (int)SwishFuzzyWordError(fw) );
     printf(" Word Count: %d\n", SwishFuzzyWordCount(fw) );
 
     printf("   words:");
Index: src/stemmer.c
===================================================================
RCS file: /cvsroot/swishe/swish-e/src/stemmer.c,v
retrieving revision 1.30
diff -u -p -d -r1.30 stemmer.c
--- src/stemmer.c	12 Nov 2006 02:52:39 -0000	1.30
+++ src/stemmer.c	28 Jan 2007 22:24:54 -0000
@@ -495,12 +495,12 @@ int SwishFuzzyWordCount( FUZZY_WORD *fw 
 
 /* Returns the integer value of the error */
 
-int SwishFuzzyWordError( FUZZY_WORD *fw )
+STEM_RETURNS SwishFuzzyWordError( FUZZY_WORD *fw )
 {
     if ( !fw )
         return -1;
 
-    return (int)fw->error;
+    return fw->error;
 }
 
 /* Frees the word */
Index: src/swish-e.h
===================================================================
RCS file: /cvsroot/swishe/swish-e/src/swish-e.h,v
retrieving revision 1.15
diff -u -p -d -r1.15 swish-e.h
--- src/swish-e.h	14 Jul 2005 17:02:34 -0000	1.15
+++ src/swish-e.h	28 Jan 2007 22:24:54 -0000
@@ -57,6 +57,14 @@ typedef enum {
     SWISH_HEADER_ERROR /* must check error in this case */
 } SWISH_HEADER_TYPE;
 
+typedef enum {
+    STEM_OK,
+    STEM_NOT_ALPHA,     /* not all alpha */
+    STEM_TOO_SMALL,     /* word too small to be stemmed */
+    STEM_WORD_TOO_BIG,  /* word it too large to stem, would would be too large */
+    STEM_TO_NOTHING    /* word stemmed to the null string */
+} STEM_RETURNS;
+
 typedef union
 {
     const char           *string;
@@ -182,7 +190,7 @@ SW_FUZZYWORD SwishFuzzyWord( SW_RESULT r
 SW_FUZZYWORD SwishFuzzify( SW_HANDLE sw, const char *index_name, char *word );
 const char **SwishFuzzyWordList( SW_FUZZYWORD fw );
 int SwishFuzzyWordCount( SW_FUZZYWORD fw );
-int SwishFuzzyWordError( SW_FUZZYWORD fw );
+STEM_RETURNS SwishFuzzyWordError( SW_FUZZYWORD fw );
 void SwishFuzzyWordFree( SW_FUZZYWORD fw );
 const char *SwishFuzzyMode( SW_RESULT r );
 


--------------020605080205000801020406--
Received on Sun Jan 28 23:52:55 2007