> Doing a "merge" for hundreds of web sites is tedious.
The common idiom for this is to break your updates down into smaller
groups, and have your searches check multiple indexes. For instance, I
run a mailing list with about 600 messages per week. Every 4 minutes I
rebuild the index for "this week's" messages; once per week I reindex
"this year except for this week" and once per year I reindex "all time
except this year" ... my searches are on three indexes, I spend very
little time rebuilding, and I haven't touched the cron jobs in almost
two years :-)
In the fall, the "tihs year except this week" begins to take longer, but
I think the worst it gets is about 15 minutes once per week. Not too
bad.
But: I think you have a different problem, since you need to delete
pages too, right? You might want to do these macro merges with a
different metric, trying to optimize for not rebuilding pages that
aren't changing much. In the long run, this might not scale for you.
Swish-e isn't a serious competitor to Google ...
/jordan
Received on Tue Sep 6 07:28:45 2005