Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] index a list of files

From: Brad Bauer <bbauer(at)not-real.telstate.com>
Date: Wed Jul 09 2008 - 02:34:29 GMT
How hard is it to update from pre 2.4?  I got the impression it would
require quite a bit of rework to get our customizations recreated.

I am using -S prog with spider.pl

RE: Caching - I am attempting to avoid downloading pdfs since it is very
time consuming compared to the fs method. (They do, after all, already exist
on the server)  Using the spider is taking 20+ minutes for only a small
section of the site, where as using the fs setup I am able to index the
entire server in about 5 minutes.


B Bauer

-----Original Message-----
From: users-bounces@lists.swish-e.org
[mailto:users-bounces@lists.swish-e.org] On Behalf Of Peter Karman
Sent: Tuesday, July 08, 2008 10:02 PM
To: Swish-e Users Discussion List
Subject: Re: [swish-e] index a list of files



Brad Bauer wrote on 7/8/08 8:44 PM:
> SWISH-E 2.2.1
> Linux www.domain.com 2.4.9-e.68 #1 Thu Jan 19 18:24:23 EST 2006 i686 
> unknown
>  

first, get an up-to-date version. 2.2 was last maintained over 5 years ago.

>  
> I have begun converting from fs to spidering, but find that 
> downloading pdfs considerably slows the spidering process.  So what I 
> would like to do is index html/php/cgi using the spider, at the same 
> time building a list of local pdfs for indexing using the considerably
faster fs method.
>  
> Is there an easy way to feed a specific list of files into swish-e for 
> indexing?

I'm guessing you are using -S http under swish-e 2.2. In the 2.4.x releases
that method is deprecated in favor of using the spider.pl Perl script in
conjunction with the -S prog method.

I would suggest using spider.pl to fetch and cache all your content, then
use the -S prog swish-e option to index the cache. Alternately, you could
configure spider.pl to download only certain content types, and then make
multiple spidering runs, creating multiple caches, and then either create
multiple indexes for later merge, or index the multiple caches into a single
index.

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Jul 8 22:34:44 2008