Swish-e has helper program spider.pl which spider the single host. Can
we give multiple hosts to spider at a time?
[mailto:firstname.lastname@example.org] On Behalf Of Judith Retief
Sent: Tuesday, February 19, 2008 3:27 PM
To: Swish-e Users Discussion List
Subject: Re: [swish-e] Regarding scalibilty and multithreading in
> How scalable Swish-e is, if we crawl million of pages,
We use swish-e to index local files, not web sites, so I can't venture
any opinion on the crawling bit as such. But what I can say is that the
core technologies of indexing and searching scale pretty well - we've
got a about 2 million content pages indexed, adding about 10 000 daily,
and the searches are fast (sub-second).
We do play around a bit to speed up the indexing; during the day, as we
receive new files to index, we index into a 'daily' index file. A
nightly job merges the daily file into a main 'master' index file.
Searches are done against all the files.
The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.
WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
Users mailing list
Received on Tue Feb 19 05:03:23 2008