I'd like your comments on this proposed plan of action for indexing a large
news web site.
The number of documents grows by about 5000 per week. Stories can be
modified during the day - but would not usually be modified after 7 days.
My current line of thinking is that there should be three index DBs.
M - the Main Index
S1 - a small index which would store stories back to the last Sunday.
S2 - a small index which would store stories from the last Sunday to the
Everynight a new batch of stories come in.
Every night (apart from Sundays) a crontab script would reindex S1 and S2.
The on Sunday nights
S2 would be merged with M.
S1 would be renamed S2
and then the new stories are indexed into S1.
The search would use M, S1 & S2.
does this sound reasonable?
or is there a better way of doing this?
I have read a few topics about staying away from the merge - is merging
still a problem?
Received on Wed Oct 15 18:36:04 2003