Skip to main content.
home | support | download

Back to List Archive

Re: pgswish - query swish-e index from PostgreSQL

From: Dobrica Pavlinusic <dpavlin(at)not-real.rot13.org>
Date: Sun Mar 06 2005 - 22:29:06 GMT
On Sun, Mar 06, 2005 at 08:55:43AM -0800, Bill Moseley wrote:
> On Sun, Mar 06, 2005 at 05:42:58AM -0800, Dobrica Pavlinusic wrote:
> > 
> > http://pgfoundry.org/projects/pgswish/

I knew that I had to write more documentation before announcing it :-)

> Cool. So the advantage is that you can use your existing code (that
> deals with results from a database) to do swish queries, correct?

Yes. Aside from that you can replace your like or ilike queries with
swish. Or order your results by swishrank.

> I assume you can do joins on the table?

Yes. That was primary motivation. Another was ability to do group by and
add aggregate functions on results.

> What's the idea/difference between pgswish and pgswish2?

pgswish returns fixed table and user can't choose which properties to
return. pgswish2 on the other hand will be able to return any property
from swish. However, there is a trade-off.

pgswish (if PostgreSQL query optimizer choose to do so) can fetch just
some results from swish-e. On the other hand, pgswish2 will always
fetch all results from swish-e to PostgreSQL. This is imposed by
SFRM_Materialize which is used.

Currently pgswish2 isn't functional. It's just a placeholder for snippet of
code which demonstrate how to return arbitrary row structure to PostgreSQL.

> > I'm interested about comments and success/failure stories if anybody is
> > interested in using this extension.
> 
> Out of curiosity, I was trying to understand the code:
> 
> Is the index opened on every SELECT or just the very first time a
> SELECT is issued?

Index is opened on every select query. One optimization that comes to
mind is pool of open indexes in PostgreSQL back-end process. However, it
dumps core on me often enough as-it, and I'm not for premature
optimizations. I also don't have any idea how to make shared memory space
(needed for such pool) in PostgreSQL back-end.

> Is there a way to get around having swish_handle, search,
> search_results, and sw_res global?  What happens if you wanted to
> search two indexes at the same time?  Or if you had a threaded
> application?

I suspect that they are global for given session. I might be wrong, but I
haven't found exact references to this topic. PostgreSQL forks new back-end
for my connection so my variables should be local to that connection.

> Should error_or_abort() return a value and then you SRF_RETURN_DONE()
> based on that?

You are absolutely right. I fixed that.

> Maybe there should be multiple function for the different objects:
> error_or_abort_(swish|search|results|result) calls -- I think you can
> get the SW_HANDLE from each type of object to check for errors.
> 
> I've never been that happy with how the C API works compared to the
> Perl xs code.  The perl code has the advantage that when variables go
> out of scope (like by calling die or returning up the stack) that the
> DESTROY methods will cascade cleaning up the parent objects.  If you
> abort when working with a result then you really need to call
> Free_Results_Object() followed by Free_Search_Object() and finally
> SwishClose() (if it was a critical error).
> 
> So having multiple error_or_abort_* functions might make that easier.

Thanks for info. Is there any advantage in having errors checked via
SW_SEARCH as opposed to SW_HANDLE? I just call Free_*_Object if I have
handle. Is that enough?

> The call SwishResultPropertyStr() is not really thread safe.  That may
> not be a big deal for a number of reasons -- such as it being unlikely
> that it would be used in a threaded application and that there may be
> other overriding thread-unsafe code in swish-e core...
> 
> The "problem" is that SwishResultPropertyStr() looks up the property
> value string and stores it in a common area of the parent DB_RESULTS
> structure (which is shared by all indexes opened with SwishInit() ),
> and the address of that cached string is returned.  So in a threaded
> application the string might get changed by another thread during the
> call to SwishResultPropertyStr().
> 
> The reason this is done is so that the calling application doesn't
> need to worry about calling free() on the string.
> 
> If you look at perl/API.xs you will see that SwishProperty() (not
> SwishResultPropertyStr() ) method calls a lower-level
> getResultPropValue(), then copies the value into a Perl scalar (which
> then Perl can worry about cleaning up) and then calls
> freeResultPropValue(pv) to free the memory in swish before returning.

Ouch. Another thing to fix. Am I correct to assume that this problem could
be exposed by two swish-e queries running in parallel from different
PostgreSQL back-ends (but only if they are threads?)

> By the way, I like your coding style. ;)

After using too much perl in last few years (which makes my brain dump
directly executable) this was my journey back in C.

> Do you have any suggestions for improving either the Swish-e C API docs or
> the API itself?

Not at this time. I'm still trying to find my way in current API :-)

-- 
Dobrica Pavlinusic               2share!2flame            dpavlin@rot13.org
Unix addict. Internet consultant.             http://www.rot13.org/~dpavlin
Received on Sun Mar 6 14:29:07 2005