Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] index a list of files

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Wed Jul 09 2008 - 02:01:32 GMT
Brad Bauer wrote on 7/8/08 8:44 PM:
> SWISH-E 2.2.1
> Linux www.domain.com 2.4.9-e.68 #1 Thu Jan 19 18:24:23 EST 2006 i686 unknown
>  

first, get an up-to-date version. 2.2 was last maintained over 5 years ago.

>  
> I have begun converting from fs to spidering, but find that downloading pdfs
> considerably slows the spidering process.  So what I would like to do is
> index html/php/cgi using the spider, at the same time building a list of
> local pdfs for indexing using the considerably faster fs method.  
>  
> Is there an easy way to feed a specific list of files into swish-e for
> indexing?

I'm guessing you are using -S http under swish-e 2.2. In the 2.4.x releases that 
method is deprecated in favor of using the spider.pl Perl script in conjunction 
with the -S prog method.

I would suggest using spider.pl to fetch and cache all your content, then use 
the -S prog swish-e option to index the cache. Alternately, you could configure 
spider.pl to download only certain content types, and then make multiple 
spidering runs, creating multiple caches, and then either create multiple 
indexes for later merge, or index the multiple caches into a single index.

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Jul 8 22:01:30 2008