Skip to main content.
home | support | download

Back to List Archive

Re: Extracting descriptions

From: Jacques Delsemme <jacques(at)not-real.cats.UCSC.EDU>
Date: Fri Dec 04 1998 - 21:24:18 GMT
Thanks for doing this.  I am testing your routine at one of our sites:

	http://www2.ucsc.edu/cats/sc/tools/search.shtml

and here are my experiences with it:

1. I had to increase the number of characters read to 2048 characters, 
otherwise the extract often disappeared altogether after the 
eliminations of the various tags at the start of a document (meta tags 
and other proprietary tags inserted automatically by some web editors).

2. By the same token, I've decreased the number of words returned to no 
more than 50.

3. I've inserted the line:

	s/<!--.*-->//gi;                    # remove comments tags

to remove comments tags.  I do this first.

4. You are using the "description" meta tag to extract the description 
of the page.  Is this use universal?  I'm curious to learn whether 
there is a well-defined standard (I plead ignorance about this), or 
whether there is a variety of meta tags in use (e.g. "abstract", 
"subject").

- Jacques Delsemme (jacques@cats.ucsc.edu)
  Workstation Support - CATS
  University of California
  Santa Cruz, CA 95064
  (831) 459-2642
Received on Fri Dec 4 13:26:48 1998