Skip to main content.
home | support | download

Back to List Archive

Re: Integrating an external index into SWISH-E?

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Mar 01 2001 - 18:05:38 GMT
Hi Paul,

At 08:23 AM 03/01/01 -0800, Paul Sandwell wrote:
>Is there a way to insert our keyword index into SWISH-E? Our index
>contains the  URL and a string of keywords for each drawing. I've
>searched the docs and mailing-list archives but haven't turned up
>anything suitable yet.

I hope I understand your situation.  You have a file that is not one of the
standard file formats that swish understands, that is it's not XML, HTML,
or plain text (well, maybe it's plain text, but you want to search your
description and not the URL).

>As far as I can see, there are two ways I can tackle this:
>1) Create a meta-data file for each drawing from our index, and let
>Swish-E loose on those

Right.  The quick-n-easy solution would be to output your data as XML or
HTML and index that.

>2) Create a special URL which feeds info for one drawing at a time and
>have Swish-E keep hammering on that

You mean use the spidering feature of swish?  That might work if you have
something that is fast (the startup costs of perl would be too much for
each document, I'd think).

What I'd like to see as a general feature is a way to tell swish to open a
pipe to a program and let this program feed documents to swish.  I have
need for this -- I use Perl's File::Cache (which uses Storable to serialize
perl data structures) to store parsed documents.  I want to index these
files, and if swish could call an external program it would be simple for
me to write a wrapper program to read in these files, and send just the
data I want indexed to swish.  Would need to figure out some simple
protocol for talking to swish, so that the file name or URL, file size,
last modified date, content-type, and so on could be passed back along with
the document contents to swish.

This design could be used for all sorts of things, like indexing a MySQL
database, or indexing gzip or pdf documents, or for spidering multiple
sites at the same time.

Is that the kind of feature you would be looking for?

I think that's something to look for after 2.2 comes out.

How's your C skills?




Bill Moseley
mailto:moseley@hank.org
Received on Thu Mar 1 18:10:21 2001