Re: [SWISH-E:312] Re: Indexing file context?

From: Dan Brickley <Daniel.Brickley(at)>
Date: Fri Jun 05 1998 - 06:46:41 GMT
On Thu, 4 Jun 1998, Roy Tennant wrote:
> what I'm talking about. I'd much prefer to put abstracts in my files and
> fetch those. 

Me too. Has anybody done any work along these lines? eg. building
something like a cache of extracted metadata from the indexed pages, so
that result-sets could include Title/Description/Keywords/Subject etc.
based on contents of META tags? Extracting these manually each time a
query occurs would presumably be a little inefficient.

Context: a couple of months ago I spent a while writing a simple wrapper
script so we can use SWISH indexed websites as WHOIS++ servers for
distributed searching. It would add a lot of functionality if we could
return slightly richer records... (just having abstracts would be great).

BTW, there are two reasons I'm using a relatively unfashionable
search'n'retrieval protocol (WHOIS++ instead of something like Z39.50).
Firstly, it's extremely simple, and secondly, there is support for
index-sharing and query routing. A SWISH WHOIS++ server can be POLLed for
copies of it's index to allow other search clients to filter out queries
that won't stand any chance of finding anything. Idea being that this
should help scaleability: eg. cross search a number of sites without
each machine having to execute a query each time a search happens

On the extracting abstracts front, I'm not sure how best to go about this.
Ideally I'd like to do it in a separate Perl script instead of hacking the
C code, but then I need to get a list of target files from somewhere,
which means parsing the config files etc...

any ideas?


Research and Development Unit                    tel: +44(0)117 9288478
Institute for Learning and Research Technology
University of Bristol,  Bristol BS8 1TN, UK.     fax: +44(0)117 9288473 
