Skip to main content.
home | support | download

Back to List Archive

Re: Re: Extracting descriptions

From: Paul J. Lucas <pjl(at)>
Date: Fri Dec 04 1998 - 23:01:33 GMT
On Fri, 4 Dec 1998, Jacques Delsemme wrote:

> 1. I had to increase the number of characters read to 2048 characters, 


> 2. By the same token, I've decreased the number of words returned to no 
> more than 50.

	I'll make this a parameter to the function.

> 3. I've inserted the line:
> 	s/<!--.*-->//gi;                    # remove comments tags
> to remove comments tags.  I do this first.

	I don't understand why.  The line:


	in my code will remove comments also.  I don't see why it has to
	be done first.  Please explain.

> 4. You are using the "description" meta tag to extract the description of the
> page.  Is this use universal?

	Probably not universal, but fairly common.  See:

	under "Provide keywords and descriptions."  Although it says:

		The value of the name attribute sought by a
		search attribute is not defined by this

	the example given uses "description."  For what it's worth,
	AltaVista uses "description"; see:

	Excite doesn't use META tags at all.  Hotbot points one to:

	that also uses "description."  (They also point you to a
	"Search Engine Features" page, but that page states that
	AltaVista doesn't use META tags which is wrong.)

	There is also the "Dublin Core" set of names:

	All of their names start with "DC." so their description would
	look like:

		<META NAME="DC.description" CONTENT="blah blah">

	I've changed the regular expression in the Perl function to
	allow an optional "DC." before "description":


	- Paul
Received on Fri Dec 4 15:02:23 1998