Brad Bauer wrote on 7/8/08 8:44 PM:
> SWISH-E 2.2.1
> Linux www.domain.com 2.4.9-e.68 #1 Thu Jan 19 18:24:23 EST 2006 i686 unknown
first, get an up-to-date version. 2.2 was last maintained over 5 years ago.
> I have begun converting from fs to spidering, but find that downloading pdfs
> considerably slows the spidering process. So what I would like to do is
> index html/php/cgi using the spider, at the same time building a list of
> local pdfs for indexing using the considerably faster fs method.
> Is there an easy way to feed a specific list of files into swish-e for
I'm guessing you are using -S http under swish-e 2.2. In the 2.4.x releases that
method is deprecated in favor of using the spider.pl Perl script in conjunction
with the -S prog method.
I would suggest using spider.pl to fetch and cache all your content, then use
the -S prog swish-e option to index the cache. Alternately, you could configure
spider.pl to download only certain content types, and then make multiple
spidering runs, creating multiple caches, and then either create multiple
indexes for later merge, or index the multiple caches into a single index.
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Users mailing list
Received on Tue Jul 8 22:01:30 2008