From: Mark Gaulin
Sent: Monday, May 17, 1999 6:44 PM
To: Rainer Scherg
Subject: RE: [SWISH-E] Re: DOC-Properties
Have you tried the latest versions of swishe (1.3.2)? There was a memory
leak bug in freeing regular expression memory that would appear even if
were not using regex's in your config file. It was fixed recently (maybe
even in 1.3.1?). This effected searching primarily but may also have
-- I've used swish 1.3.2 (from sunsite.berkeley.edu) enhanced with
the filter option to index pdf's and other docs...
I recompiled my copy of swish (1.3.0) with SUPPORT_DOC_PROPERTIES
and indexed a small set of documents and compared memory usage. There was
tiny difference in memory used (less than 0.2%). (I am running on NT and
there are some C runtime library functions that made it easy to see how
much memory was being used. You may have similar functions available to
on your platform.)
-- I'm using sun solaris 2.6, using the top-tool to measure roughly the
of used memory and program "performance"/impact.
Note: I needed to add a "#ifdef SUPPORT_DOC_PROPERTIES" / "#endif" block
around some code in the function addMetaMergeList() in merge.c to get it
compile without SUPPORT_DOC_PROPERTIES. You would have do the same to get
clean compile with the latest version of swishe.
-- Yep, I've reported this as bug #12 in the swish-e bug-db, when trying
compile swish without the DOC_PROP support.
Q: Did you comment out/delete the "#define SUPPORT_DOC_PROPERTIES 1" or
you change it to "#define SUPPORT_DOC_PROPERTIES 0" in your config.h
Since "#ifdef" is used as the test in the code you would need to
remove/comment out the #define line to really get rid of the doc property
code. Perhaps the code should be changed to use #if instead of #ifdef.
-- As stated: the DOC_PROP feature is disabled (commented out).
I have two "bottom line" responses at this point:
1. So far I have not found a memory-leak type bug associated with
Properties. It doesn't mean it's not there but since you are not even
that feature most of the code related to Doc Props is not even called, so
it makes it even more unlikely to be the direct cause of excessive memory
use during indexing.
-- What was curious is not the huge amount of memory usage, but the
between the index file sizes (@200MB to @30MB).
2. As we all know, swishe uses RAM to store all indexing information,
causes it to use more memory than it could if the speed/memory usage
tradeoff was weighed differently. I could see doing a first-order
to using less ram without affecting indexing speed too much just by
file names, titles (and document properties) in a temp file during
indexing. Storing the word lists and the index itself on disk during
indexing would be more complex and would slow things down noticeably but
for really huge sets of files this might be needed to keep RAM usage
Sorry I could not be of more help. I will continue to work on this as
information comes in.
-- Tnx a lot for your help.
At this point I can live without the DOC_PROP support.
But I've a have a little more time to spare, I will include some debug
to see what's happening.
[lots of stupid outlook quote stuff deleted]
Received on Tue May 18 03:45:05 1999