I use the merge option quite frequently with the spider option to index many
domains. It is slower than the file system, but you can launch several
spiders at once, make several indexes at once, and then merge them. It is
also nice because all the URLS are correct to different domains without
having to do a lot of different substitutions in the URLs and paths. Also,
you can kill the spider when you think you have indexed enough pages of a
particular site and the swish-e engine will finish up the index. Also, if
something goes wrong while spidering/indexing, only a small part of your
entire work is stopped...
[mailto:firstname.lastname@example.org]On Behalf Of email@example.com
Sent: Tuesday, June 08, 2004 2:48 PM
To: Multiple recipients of list
Subject: [SWISH-E] Incremental updating
Is it possible to do some kind of incremental update (via spider or file
system)? If not, how often would you reccommend indexing a large (1,000,000+
hits a month) site?
Also, this is a problem for me doing a large number of sites at once. We're
trying to develop a search engine for our entire program which has around
100 domains. I started it last night and when I got in in the morning, it
had frozen, swish-e was using 630MB of RAM, and the computer was barely
usable. I'm using a 500Mhz P3, so it's not unexpected for something like
this so happen, but if I could do 5-10 a night, and then incrementally add
more to the index, it'd be great.
Received on Wed Jun 9 00:15:41 2004