On Sat, 12 Dec 1998, Dave Thomson wrote:
> There's been a couple of posts wondering how to display index results along
> with descriptions, and I guess this is one reason people end up sending
> Micro$oft and others thousands of their (company's?) dollars.
I fail to see what Microsoft has to do with search engine
> It can actually be done very easily within swish-e.
Actually, it's done *outside* of SWISH-E. The problem is
simply: given a text or HTML file (regardless of how you
obtained its file name), print a description for it.
> I'm no perl expert, but here's what I've implemented to do this. It requires
> reading 2500 bytes for each hit (I'm banking on no html tags being open after
> 2500 bytes) so this is quite hungry at search time, but if you only do this
> for 'pages' of 25-hits or so, it works very well for local files. I'm sure
> someone can find a more efficient way to strip the html...
Yes: see my WWW.pm Perl module bundled with the SWISH++ (beta)
distribution. My code correctly handles both HTML and plain
sheets, and title text; also extracts text from ALT attrbutes
of IMG and AREA elements. It's also efficient (fast).
Received on Sun Dec 13 00:48:46 1998