On Wed, Oct 15, 2003 at 11:35:22AM -0700, Sean Downey wrote:
> The number of documents grows by about 5000 per week. Stories can be
> modified during the day - but would not usually be modified after 7 days.
Will old stories every be removed? Swish-e is fast but was not designed
for an every increasing number of documents. Scalability is an issue.
>
> My current line of thinking is that there should be three index DBs.
>
> M - the Main Index
> S1 - a small index which would store stories back to the last Sunday.
> S2 - a small index which would store stories from the last Sunday to the
> Sunday before.
So the point of S2 is to allow for merging, correct?
> The search would use M, S1 & S2.
>
> does this sound reasonable?
> or is there a better way of doing this?
> I have read a few topics about staying away from the merge - is merging
> still a problem?
I'd suggest testing, of course (and reporting back your findings).
Merge should work mostly like normal indexing. It avoids the re-parsing
of documents, but it has to do additional sorting of all the word data
to accomplish the merge. So there's some trade-offs and I think testing
is the only way to see what happens with your data.
--
Bill Moseley
moseley@hank.org
Received on Wed Oct 15 19:56:04 2003