We've got an existing swish-e installation (version 2.4.5) that's indexing
a constantly growing set of XML data. As we get incoming data we use -S prog
-i stdin to index to a temporary index file, and then merge that into the
master index. So far so good - we've got close to a million articles.
From what I've read (I'm taking over the project) I'd prefer to switch to
incremental indexing. Firstly using incremental indexing should cut out the
cumbersome merging step. Secondly, and more importantly, we have to be able
to remove content from the index as some of our content has contractural
expiry dates and we have to remove it from our data set.
I've got two questions on incremental indexing, though;
1) The only documentation I can find on creating an incremental index is
using -N (or Update-Mode), providing a filename - only files newer than the
given file will be indexed. But we don't index files - our data resides in a
database so we use stdin to provide the data to swish-e on the cmdline. Is
it possible to create an incremental index if your data doesn't live in
2) How stable is incremental indexing by now? It's been available, from
what I can see, since December 2003. And there are some references in the
archive of people that use it sucessfully. But there are also a post or two
about stability problems, and the latest documentation still advises to ask
on this forum before using.
Users mailing list
Received on Tue Oct 23 08:58:29 2007