> How scalable Swish-e is, if we crawl million of pages,
We use swish-e to index local files, not web sites, so I can't venture any opinion on the crawling bit as such. But what I can say is that the core technologies of indexing and searching scale pretty well - we've got a about 2 million content pages indexed, adding about 10 000 daily, and the searches are fast (sub-second).
We do play around a bit to speed up the indexing; during the day, as we receive new files to index, we index into a 'daily' index file. A nightly job merges the daily file into a main 'master' index file. Searches are done against all the files.
Users mailing list
Received on Tue Feb 19 04:57:01 2008