At 05:58 -0800 15-11-2000, Bill Moseley wrote:
>OTOH, I'm not sure that this feature can't be handled outside of swish if
>Properties won't work in some case. It's faster to access the document
>summaries if they are in the index, but it might come at the expense of
>speed when searching -- and that is swish's main job.
>If using the file system you can always access the documents from your CGI
>front-end to show a summary of the first x characters. If indexing with
>the httpd method then maybe the spider can extract the first x characters
>and save it to a local file or database depending on your needs. That
>would be better with HTML as you could use HTML::TreeBuilder to extract out
>correct HTML instead of just chopping it off after x number of characters.
Lookup-1.6.0 has this 'outside' approach for summaries (abstracts),
for an example see:
Abtracts are generated at runtime, it uses abstract.pm (attached),
based on code by Steve van der Burg, with a routine:
my $dhp = new HTML::DocHead;
open(THISFILE,$_) or return;
$content .= $_;
Steve originally used it for swishspider and stored results in a gdbm file,
more portable would be a AnyDBM construction.
Point is that you would want to reduce the runtime load of search.cgi's.
When the resultset is over 25 files, extracting abstracts takes too
much time IMHO.
-- /''' Bas Meijer mailto:firstname.lastname@example.org
c-OO http://antraciet.com Web Services
\ > Kerkstraat 19 Postbus 256 1400 AG Bussum
\&& t. +31 35 7502100 f. +31 35 7502111
Received on Wed Nov 15 14:46:45 2000