Skip to main content.
home | support | download

Back to List Archive

Multiple sites index: configuration and performance questions

From: Gaël Lams <lamsgael(at)not-real.gmail.com>
Date: Wed Feb 22 2006 - 13:04:40 GMT
Hi all,
I've to index more or less 14 web sites (for the time being). It's myunderstanding (probably wrong?) that I've to  indicate in theswish-conf to use spider.pl for indexing and then create thespider.config file with the array of the web sites I have to index(testing only with two for the time being).
I then run "/usr/local/bin/swish-e -S prog -c swish.conf" but only thelast web site indicated in the array seems to be indexed: the firstsite (www.icftu.org, as you will see below in my configuration) doesnot seem to be taken into account: it does not appear on the terminaloutput and the search's tests confirm that it has not been indexed(I'm able to search http://www.cmt-wcl.org).
I read the documentation but I'm probably missing something. Any helpwould be appreciated.
Also, as the I will probably have to index +/- 50 web sites in a nearfuture, I was wondering whether there was any kind of "best practices"or advices to have a scalable set-up.
You will find below my exact configuration:
Regards,
Gaël

- swish-e -V: SWISH-E 2.4.3- perl -v: v5.8.1 built for i586-linux-thread-multi- OS: Suse Professional 9.0, distribution's kernel 2.4.21
- swish-conf:# Use spider.pl for indexingIndexDir spider.pl
# Use spider.pl's default configuration and specify the URL to spider# run it with /usr/local/bin/swish-e -S prog -c swish-e/swish.confSwishProgParameters spider.config
# Allow extra searching by titleMetanames swishtitle
# Set StoreDescription for each parser to display context with search resultsStoreDescription TXT* 10000StoreDescription HTML* <body> 10000
- spider.config:my %site1 = (   base_url   => 'http://www.icftu.org',   email      => 'internetpo(at)not-real.icftu.org',);
my %site2 = (   base_url   => 'http://www.cmt-wcl.org',   email      => 'info(at)not-real.cmt-wcl.org',);
@servers = ( \%site1, \%site2 );1;
Received on Wed Feb 22 05:04:52 2006