I've looked through the FAQ and the discussion list archive, but haven't
found a definitive answer, so hopefully you can help us out.
We already use Swish-E to index a small amount (~300 MB) of various
files (-doc, .ppt, etc.). To handle that we have setup a hourly job in
our scheduler which recreates the index every time. The index has to be
build every hour because the fluctuation in the filesystem is quite high .
We now need to index quite large filesystems (> 1GB) and some huge
intranet websites but still need the index to be built every hour. The
problem with our current approach is the time which it takes to build-up
the index from scratch, causing it to exceed the 1h timeframe.
I've read a lot about the incremental index stuff which seems to be
exactly what we need. We build the index one time and later just index
the new and remove the deleted documents from the index.
So if I build Swish-E with --enable-incremental and later use -r to make
sure "old" documents gets removed from the index, is that what we need
to handle those huge amount of data, Or do you still see problems,
despite the fact that that 2.5. is just a BETA yet?
A colleage has already evaluated http://search.mnogo.ru/, which seems to
do exactly what we need, but it's still complete "virgin soil" for us,
so it would be great if you have some ideas how we can accomplish that
Received on Mon Nov 29 07:00:17 2004