Skip to main content.
home | support | download

Back to List Archive

Re: Displaying index results with summaries (no meta tags

From: Paul J. Lucas <pjl(at)not-real.ptolemy.arc.nasa.gov>
Date: Sun Dec 13 1998 - 08:48:08 GMT
On Sat, 12 Dec 1998, Dave Thomson wrote:

> There's been a couple of posts wondering how to display index results along
> with descriptions, and I guess this is one reason people end up sending
> Micro$oft and others thousands of their (company's?) dollars.

	I fail to see what Microsoft has to do with search engine
	results.

> It can actually be done very easily within swish-e.

	Actually, it's done *outside* of SWISH-E.  The problem is
	simply: given a text or HTML file (regardless of how you
	obtained its file name), print a description for it.

> I'm no perl expert, but here's what I've implemented to do this. It requires
> reading 2500 bytes for each hit (I'm banking on no html tags being open after
> 2500 bytes) so this is quite hungry at search time, but if you only do this
> for 'pages' of 25-hits or so, it works very well for local files. I'm sure
> someone can find a more efficient way to strip the html...

	Yes: see my WWW.pm Perl module bundled with the SWISH++ (beta)
	distribution.  My code correctly handles both HTML and plain
	text files, META elements, comments, ignores JavaScript, style
	sheets, and title text; also extracts text from ALT attrbutes
	of IMG and AREA elements.  It's also efficient (fast).

	- Paul
Received on Sun Dec 13 00:48:46 1998