On Fri, 4 Dec 1998, Jacques Delsemme wrote:
> 1. I had to increase the number of characters read to 2048 characters,
> 2. By the same token, I've decreased the number of words returned to no
> more than 50.
I'll make this a parameter to the function.
> 3. I've inserted the line:
> s/<!--.*-->//gi; # remove comments tags
> to remove comments tags. I do this first.
I don't understand why. The line:
in my code will remove comments also. I don't see why it has to
be done first. Please explain.
> 4. You are using the "description" meta tag to extract the description of the
> page. Is this use universal?
Probably not universal, but fairly common. See:
under "Provide keywords and descriptions." Although it says:
The value of the name attribute sought by a
search attribute is not defined by this
the example given uses "description." For what it's worth,
AltaVista uses "description"; see:
Excite doesn't use META tags at all. Hotbot points one to:
that also uses "description." (They also point you to a
"Search Engine Features" page, but that page states that
AltaVista doesn't use META tags which is wrong.)
There is also the "Dublin Core" set of names:
All of their names start with "DC." so their description would
<META NAME="DC.description" CONTENT="blah blah">
I've changed the regular expression in the Perl function to
allow an optional "DC." before "description":
Received on Fri Dec 4 15:02:23 1998