Hello fellow swisheans!
We've got a few million documents indexed with swish here,
distributed across multiple indices. Queries that access
all of the indices and that generate a lot of hits are running
us out of physical memory (we've got 8GB too!). For example,
if some silly user issues a query like: 'a*' across all
the indices, it will generate many millions of hits. The process
that is querying the indices via the API will grow bigger than
available physical memory and start the machine thrashing.
Swish seems to collect *all* the hits in memory so that it can rank
them, before returning any hits at all. If we don't care about the
ranking, is there some way to gets hits as they occur and not incur
the big memory storage penalty? We'd like to halt the search when
the number of hits exceeds some threshold.
Received on Mon Jan 9 12:47:58 2006