Skip to main content.
home | support | download

Back to List Archive

Re: using the library: thread/process safety and

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Sat Jun 02 2001 - 18:55:38 GMT
At 10:53 AM 06/02/01 -0700, Jerry Asher wrote:
>I want to create an AOLserver interface to swish-e.  Simplistically, 
>AOLserver is a GPL'd C based, multithreaded Tcl extension webserver.  (I 
>say that in that it uses Tcl 8.4 and is written in the same style as Tcl 
>itself.)  In AOLserver, each web request is responded to by a different 
>thread.  Each thread gets its own Tcl interpreter AND there is a way for 
>each thread to share a global Tcl data structure (basically a hash table.)

Someone will have to answer that knows more about threaded applications.
(That might be you!)

Like I said, there's been a push to use a "session" in swish, which I think
simply means that swish is suppose to create ALL its data structures at the
SwishOpen, and free them at the SwishClose.  So this means that a single
thread should be able to run multiple queries at the same time without
conflicts.  And if that's true, then I would think that it would work
threaded.  Again, I'm guessing since I know nothing about threaded apps.

>Ideally, I would like each thread to simply call a function that performs 
>the SwishOpen, SwishSearch, SwishNext, SwishClose loop and returns a list 
>of hits.

>But it sounds as though
>a)  I might run into thread safety issues.
>b)  I might run into memory issues

>Can someone clarify this?  What issues are likely to arise?  

I think we can track down the memory leaks.

I wonder more about the design of swish.  Being thread safe may not be what
we want ;).  I would think the goal of a threaded server would be to avoid
duplicating the large memory structures to hold the index(es).  

Embedding swish into Apache (each Apache child) can be a problem since
although the code might be shared (copy-on-write), the data structures
probably are not.  So, the trick might be to get a swish server to share a
pool of loaded index files.  But, I have no idea how that would be done.

>And let me 
>make sure, after a SwishClose, is the memory for that search returned?

Yes, that's the goal is to return all memory at the end of a request.  Look
in swish2.c at SwishNew() (called by SwishOpen) and SwishClose().

>>My limited experience when testing an embedded swish on my linux machine
>>with Apache was that I didn't see much shared memory, and that linux was so
>>good at caching the swish-e binary that I didn't see that much improvement
>>in speed between using the library and the binary when tested with Apache
>>Benchmark.
>
>Can you clarify this?  Which binary are you referring to?  What was your 
>architecture?  I am not that familiar with Apache, am I right to think the 
>conventional swish solution is a cgi/perl based process forking solution?

The swish-e binary.  I set up a mod_perl script (which is a pre-compiled
perl program loaded into the Apache child) that ran swish two different
ways: one way was to fork (fork the Apache child and exec swish-e), and the
other way was to use the swish-e C library which avoids the fork.

Using Apache Benchmark to see how many requests per second I could do
showed not much difference.  But I was testing on a Linux machine with
nothing else running, so Linux was caching the swish-e binary, caching file
reads, and a fork under Linux is super fast.  So it wasn't really a good
test as everything was probably in memory anyway.

Under a loaded machine I'd expect that the Library would be faster, but at
the expense of memory.

>>There's been a lot of work to make swish thread safe, with the goal of
>>building a swish server, which would be a lot nicer on memory usage for
>>something like Apache.
>
>Now regarding thread safety, there are two alternate approaches I can take:
>
>I might:
>
>1.  devote one thread within AOLserver just for running SWISH and have the
>     other threads delegate all their searching to that one thread
multiplexing
>     and coordinating by way of queues and mutexes

That makes sense, especially if all the threads are reading from the same
index file.  On the other hand, since so much time is probably spent
waiting for I/O, I wonder if multiple threads would help -- especially on a
multiprocessor machine.  (I assume threads can span CPUs, yes?)


>2.  build an external process swish-e server (harder but perhaps better
for the swish-e community)

Oh, ah, let's see.  You need an account at sourceforge and CVS access....

>An oddly phrased question: how stable is the swish-e library?

Rock solid. ;)

>Am I 
>likely to experience memory corruption in the swish-e datastructures (or 
>elsewhere in my process) by using the 2.1 library?

Na, the segfaults will kill you first ;)

The library is swish.  There are bugs in the code, no doubt -- maybe
related to threaded issues, and maybe logic errors, and maybe memory leaks
(although we spent some time on hunting down leaks lately).  But, it's open
source and can be fixed...

>If so, I would prefer 
>to create the external process swish-e server today!
>
>So regarding a swish-e server, does anyone have a suggested 
>architecture?
>And one final question: years ago, I heard of some unix 
>utility that you could wrap around a library to turn it into a simple 
>daemon, does anyone have a clue as to what it was I heard?

I'm a perl programmer, and I've been tempted to build a forking swish
server with perl -- not very hard at all.  It would probably have a smaller
memory impact than, say, embedding swish (and perl) into Apache since a few
swish processes could probably serve a bunch of Apache children.  But, I
still think the goal is to pool and share the indexes.

A server is really a better, more scalable solution since it can be moved
to other machine(s) as demand goes up.

I'm interested in this topic, but I just don't have much knowledge to offer.


Bill Moseley
mailto:moseley@hank.org
Received on Sat Jun 2 19:07:02 2001