> At 11:09 AM 09/19/01 -0800, you wrote:
> >> At 10:42 AM 09/19/01 -0800, Michael wrote:
> >> >I need to. I have years worth of search database files that are
> >> >updated daily.
> >> How many files? I have some indexes that took 10 hours to index
> >> with 1.3.2 and now take about 10 minutes in the dev version....
> >Over 100,000
> What I have done in the past is to index everything, and then run
> indexing on just new files and then search using -f index.full
> index.new. Then every once in a while reindex everything into
> index.full again.
> >This is on a production server and it has to work bug free. I am not
> >real comfortable with using development code.
> I can understand that. Although, I'd bet there have been more bug
> fixes from the old code than creation of new bugs ;) But all it
> takes is one...
> You should really test out the development code and see how you
> indexing times and memory requirements change. You don't have to
> use it in production yet.
It's not really practicle for me to try the dev code until merge is
working. On 2.0.5, I stopped a full index ~ 97k files when the memory
requirements grew beyond 600 megbytes -- about a day and a half of
indexing. I was able to index and merge the data incrementally. The
last merge involved a 66meg index and 1.5meg index. The memory
footpring was just shy of 600megs. This seems a bit inefficient???
The merge was done incrementally from 47 index files that are each
about 1.5megs. The merge of the first two created a 50+meg memory
footpring and grew from there to 600+megs for the final merge. I
tried it in one fell swoop, but it did not appear to being doing well
after about a half day so I stopped it and used the incremental
approach. The guidelines in the FAQ seem to indicate that the memory
required should be about twice the size of the file -- I'm finding
that 10x is much closer. Hopefully 2.2 can improve on this.
If you'd like the various index and config files for test cases
you're welcome to them. They get generated every month and get bigger
every month so I'm looking forward to trying 2.2 when it is ready for
for the interface. The USA mail archive is the one that is detailed
above. It is broken into three index files
1 -- all except the last two calendar months
1 -- last month, merged into above monthly
1 -- current month, indexed daily
Received on Wed Sep 26 21:15:00 2001