Skip to main content.
home | support | download

Back to List Archive

Re: Comments on new SWISH::API proposals

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Tue Jan 14 2003 - 17:07:21 GMT
On Tue, 14 Jan 2003, Alex Lyons wrote:

> We currently use swish-e 2.2.2 on our site via variants of the
> supplied swisg.cgi, but I'm thinking of writing something using the
> Perl module, so I'm looking with interest at the SWISH::API proposals
> in the current dev version.
> 
> Our site is split into about 9 main areas, so it seemed sensible to
> index each area separately and then give users the option of selecting
> which indexes to search in (all done nicely by swish.cgi).  In
> considering a move to a persistent Perl search engine, one of the main
> savings would, I presume, be that the indexes would remain open
> between queries.  But I'm not sure whether this will be possible using
> the proposed API.  The proposal, as I understand it, is:
> 
> $swish = SWISH::API->new('index1 index2 ...');
> $search = $swish->New_Search_Object;
> $results = $search->Execute($query);
> 
> If for each query the set of index files is likely to change, it looks
> like I'll have to go right back to reopening all the index files anew
> each time, which seems to negate some of the point of going for
> persistence.

You create a $swish object for each set of possible index files.

  my $index_string = sort @index_files_to_open;

  $open_indexes{$index_string} ||= SWISH::API->new( $index_string );
  # check errors..

  my $swish = $open_indexes{$index_string};


If you have many many index files that can be selected in any combination
then that may be too many index files open at one time.  In that case you
might want some logic to keep only the X last used $swish objects around.

>  Also, I can't see the point of distinguishing between
> the $swish and the $search objects: one seems to be a superset of the
> other.

It's kind of a superset.  The $swish object is simply the index files
opened and their headers parsed.  It prepares the index files for
searching, and builds some linked lists to tie the index files together.

The search object contains all the parameters for a given query,
including sort order, limit parameters, and the query string.  Once you
have a search object you can run multiple queries using that search
object, and only change the query words on each search.  That saves the
time of having to recreate the search parameters on every search.

But really the point is that when the search object goes out of scope all
the memory used for setting up the search can be released without
destroying the $swish object.  And that multiple search objects can exist
at the same time (say you only provided a few pre-defiend sort options).

> Ideally what I would like to see would be something like:
> 
> $index1 = SWISH::API->Open('index1');
> $index2 = SWISH::API->Open('index2');
> $search = SWISH::API->New_Search_Object($index1,$index2,...);
> $results = $search->Execute($query);
> 
> In this case I'd only have to recreate the $search object for each
> query, which I'd have to do anyway if the user changed the sort order,
> or whatever.

No, you can change the sort order on an existing search object.  You can
change everything -- the limit parameters must be reset, though, but you
can do that without creating a new search object.

> The $index objects would be persistent, opened once at
> initialisation. Most of the methods you currently propose for the
> $swish object could become methods of the $search object.

It might be possible to do what you show above -- I'd have to look at the
code.  The only advantage of that is that you don't have index files
opened more than once.  For some number of index files it probably would
not make any difference, but for a large number it might be an advantage.

Let me take a look.  Speak up if you don't hear back.


-- 
Bill Moseley moseley@hank.org
Received on Tue Jan 14 17:07:38 2003