Skip to main content.
home | support | download

Back to List Archive

Re: Index e-mail archives?

From: Ron Samuel Klatchko <rsk(at)not-real.brightmail.com>
Date: Wed Sep 22 1999 - 22:08:55 GMT
Shaibal Roy wrote:
> Does SWISH-E provide an API that I can use? My data is stored
> in a database and not on a file system. I would like to use
> index+search functionality of SWISH-E, but leave out the parts
> that assume a file system or http.

Yes, you can connect new access methods to the swish engine.  It's not
well documented (your the first person to express interest in adding a
new access method since I virtualized the code).

What you need to do is create four functions: indexpath, vgetc, vsize
and parseconfline.  The first function is called with the starting
path/url/whatver to index.  It's job is to keep call countwords once for
each entity to index.

You call getwords with four parameters:
*  A void * pointer which identifies the entity being indexed; we'll get
back to this when we discuss vgetc and vsize.
*  A char * which is an identifier (path/url/whatever).  It's not
interpreted, swish just returns it during a search if the entity is
matched.
*  A char * which is the title.  If your document does not have an
identifiable title, you should send in the identifier.
*  An int which indicates whether swish should only index the title.

vgetc and vsize get the next character and the size respectively of the
entity you are indexing.  They both take a single parameter which is the
void * pointer you passed into getwords().

parseconfline is used if you want to add method specific configuration. 
When the main engine sees a configuration line it doesn't understand, it
calls you parse line with the line.  If you recognized it, return 1,
otherwise return 0 so that swish can indicate it has an unrecognized
option.

Once those are written, you need to hook your code into the main
engine.  You'll need to create an instance of struct
_indexing_data_source_def and add a pointer into data_sources in
methods.c.

That should be enough to get you started.  Look in http.c or fs.c to see
how the current code works.

Finally, if you have any questions, make sure to post them to the group.

moo
------------------------------------------------------------
           Ron Samuel Klatchko - Software Jester
            Brightmail Inc - rsk@brightmail.com
Received on Wed Sep 22 15:22:07 1999