Skip to main content.
home | support | download

Back to List Archive

Re: Shrinking swish-e memory footpring

From: Michael <michael(at)not-real.bizsystems.com>
Date: Wed Sep 26 2001 - 21:14:10 GMT
> At 11:09 AM 09/19/01 -0800, you wrote:
> >> At 10:42 AM 09/19/01 -0800, Michael wrote:
> >> >I need to. I have years worth of search database files that are 
> >> >updated daily.
> >> 
> >> How many files?  I have some indexes that took 10 hours to index
> >> with 1.3.2 and now take about 10 minutes in the dev version....
> >
> >Over 100,000
> 
> What I have done in the past is to index everything, and then run
> indexing on just new files and then search using -f index.full
> index.new.  Then every once in a while reindex everything into
> index.full again.
> 
> >This is on a production server and it has to work bug free. I am not 
> >real comfortable with using development code.
> 
> I can understand that.  Although, I'd bet there have been more bug
> fixes from the old code than creation of new bugs ;)  But all it
> takes is one...
> 
> You should really test out the development code and see how you
> indexing times and memory requirements change.  You don't have to
> use it in production yet.
> 

It's not really practicle for me to try the dev code until merge is 
working. On 2.0.5, I stopped a full index ~ 97k files when the memory 
requirements grew beyond 600 megbytes -- about a day and a half of 
indexing. I was able to index and merge the data incrementally. The 
last merge involved a 66meg index and 1.5meg index. The memory 
footpring was just shy of 600megs. This seems a bit inefficient???
The merge was done incrementally from 47 index files that are each 
about 1.5megs. The merge of the first two created a 50+meg memory 
footpring and grew from there to 600+megs for the final merge. I 
tried it in one fell swoop, but it did not appear to being doing well 
after about a half day so I stopped it and used the incremental 
approach. The guidelines in the FAQ seem to indicate that the memory 
required should be about twice the size of the file -- I'm finding 
that 10x is much closer. Hopefully 2.2 can improve on this.

If you'd like the various index and config files for test cases 
you're welcome to them. They get generated every month and get bigger 
every month so I'm looking forward to trying 2.2 when it is ready for 
prime time.

See:
http://www.insulin-pumpers.org/search.cgi

for the interface. The USA mail archive is the one that is detailed 
above. It is broken into three index files
1 -- all except the last two calendar months
1 -- last month, merged into above monthly
1 -- current month, indexed daily

Michael
Michael@Insulin-Pumpers.org
Received on Wed Sep 26 21:15:00 2001