Skip to main content.
home | support | download

Back to List Archive

Re: A modularized view of a search engine

From: <moseley(at)not-real.hank.org>
Date: Thu Oct 09 2003 - 17:31:49 GMT
On Thu, Oct 09, 2003 at 05:00:44PM +0200, Magnus Bergman wrote:

> The main point: the job of retrieving a (fixed size, linear) document
> only needs to be implemented once for the whole system. Each and every
> program that needs this functionality can use the same code.

Good point.  Swish uses Perl's LWP code everywhere when it needs to grab 
a URL.  No reinventing the wheel there.

> I must admit that I haven't looked must at SWISH::Filter since I don't
> know Perl. Can it easily be used on the command line to convert
> documents? And can other command line filters easily be used with
> swish-e? (By easy I mean without writing any Perl code.)

Just because you don't know Perl doesn't mean it isn't easy.

Can SWISH::Filter be used at the command line?  Yes (well two ways: one 
is that you can run Perl from the command line, but the other is with 
the swish-filter-test program that is a simple wrapper for 
SWISH::Filter.

So you want:

   url_list | url_fetch | swish-e

But that's not going to be very general purpose.  How are you going to 
get the content-type or other HTTP header data to swish-e?  You can do 
this with swish-e:

   spider.pl | swish-e -S prog -i stdin -c some.config

How would you modify that?

Swish-e is no where near perfect, so I look forward to your input after 
you familiarize yourself with the docs.  The docs are not perfect 
either, so input there is helpful, too.


-- 
Bill Moseley
moseley@hank.org
Received on Thu Oct 9 17:36:38 2003