Skip to main content.
home | support | download

Back to List Archive

Re: here goes a newbie for stepping on toes....

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Tue Jun 05 2001 - 06:10:30 GMT
Hi Jerry,

At 06:06 PM 06/04/01 -0700, Jerry Asher wrote:
>I am following the library spec found in 4444/SWISH-LIBRARY.html

That doc may be slightly out dated.  You would be better off looking at the
source code.

You should also get on the CVS list, as we don't want to flood this list
with a lot of details talk.
http://lists.sourceforge.net/lists/listinfo/swishe-cvs

>What's complicating my life is that there doesn't really seem to be a 
>separable client library for SWISH-E.  Headers files are enormous and 
>define all sorts of structure that the client doesn't care about and that 
>go way beyond what is called for in the SWISH-E library spec.  The header 
>files detail all sorts of fields that the client most likely shouldn't have 
>access to, in particular a RESULT.

As I said, the library is just an archive of all the modules in swish.  To
make swish-e, the swish.o module is linked with the library.  To embed
swish into another application (like a swish-e server) would mean you link
with that same library.

But, I agree, that the client should have less access.  The original code
probably had no idea that it would someday be embedded in AOLserver or run
as a swish-e server.

>So for instance, I would have thought that a SwishNext routine would return 
>more opaque data structures.  Certainly a data structure consisting of ints 
>and strings and not pointers.

Sounds good.  I'm not sure of how much work that would be to make it a happen.
Aren't strings pointers in C?

>Stem is interesting.  It doesn't appear to be an interface/library function 
>at all, since it feels free to efree and emalloc (using the swish internal 
>memory routines) on the structures that are passed in (char **inword).  

I'm not sure I follow what you are asking.  Stem() is callable from the
library.  Almost everything uses the swish-e efree and emalloc routines, too.
If you are talking about using Stem() all by itself, then you would have to
change to malloc and free (or get rid of those calls completely).  That's
what I do for the SWISH::Stemmer perl modules.

>I 
>would have thought a Stem library function would take in a string and maybe 
>feel up a string buffer in return, but that it would certainly not use an 
>internal memory routine on passed in strings.

The code's been in a number of hands.  There was at one point a push to
make sure there was no buffer overruns -- and one way to deal with that was
to just allocate more space when needed.  Frankly, I think some of that is
overkill (see recent discussion about stemmer.c on the CVS list), but in
some places it makes sense.  But in general, I think we could get by with
more stack variables of "reasonable" size and that's that.  And stemmer.c
would only reallocate if someone was indexing words of, say, 999
characters.  Probably not too likely.


>Anyway, there have been some other little gotchas too.  DEBUG_MASK needs to 
>be defined in the library even though it is only used for debugging the 
>indexing routines.

Well, that's not really true.  Right now there's only debugging hooks
placed in a few spots, and that's because that it's an experimental
feature.  There's plans to have more hooks.  But there has not been a
decision yet on what the actual implementation will be.  DEBUG_MASK is
defined in swish.c, which isn't in the library, and that was just a place
of convenience for the example code.

>So without meaning to step on anyone's toes, what's the state of the 
>library?  Am I using an old/incorrect document?  How would the project be 
>of significant changes to SWISH-E intended to make building libraries easier?

It kind of sounds like you are confused about what the library is.  It's
just swish.  It's not a separate group of core functions to read the data
base.  Maybe it would be nice if it was (would have make my Apache test
server use less memory, perhaps).

>What should I be doing?

What I'd suggest is this:  join the CVS/devl list as I mentioned above, and
then write some test code and see what won't work for you and we can see
what can be done based on your suggestions.  I've managed to embed swish
into both Perl programs and into Apache, so you should be able to get some
basic test code up and running and then work from there.



Bill Moseley
mailto:moseley@hank.org
Received on Tue Jun 5 06:14:15 2001