I'd like to hear your suggestions for doing large-scale multi-server
indexing with swish-e.
(1) What are the are the pros and cons of doing a single big index
(giving it starting URLs across all servers) vs. doing a number of
small indexes and merging them?
(2) What are issues likely to cause problems in scaling up?
(3) How large are some indexes that people have created sucessfully,
and what hardware/time does it take to do it?
The case I'm interested in is creating a campus-wide index of the
semi-official servers at our university.
No one knows exactly how much is out there to index, but rough
guesses suggest 200-300 servers, with something like 100,000 -
200,000 HTML pages.
Albert Lunde Albert-Lunde@nwu.edu
Received on Fri Jun 9 13:17:14 2000