Re: Displaying index results with summaries (no meta tags

From: Yann Stettler <stettler(at)>
Date: Sat Dec 12 1998 - 22:07:36 GMT
Dave Thomson wrote:

>    open(FILE,"$file") || print "error: $file";
>    while ($chars <2500)
>      {
>      $mychar = getc(FILE);

I didn't looked at the rest but if I were you, I would read
directly 2500 chars in one system call instead of doing
2500 getc() and string cat each character....

> $junk =~ s/<(([^>]|\n)*)>//g;

This won't work for comment that may contain an HTML tag.
Ie : <!-- This is a <B>comment</B> -->

I am using a "little" more complex method myself :


   # Read first 4KB
   read(S, $cbuffer, 4096);
   # We work on the whole buffer. Let remove new lines..
   $cbuffer=~s/\n/ /g;
   # I don't want the head : Title tag is already in the DB anyway.
   # Discard comments.
   $cbuffer=~s/\0/ /g;
   # Discard other html tags
   $cbuffer=~s/<[^>]+>/ /g;
   # There is no use to display only spaces... One is enough
   $cbuffer=~s/\s+/ /g;
   # Keep the first few characters
   $context=substr($cbuffer, 0, 520);
   # Remove the last "word" in case it was cut

I don't say that it's foolproof... but it should behave proprely
in most cases... (and even with all those regexp, it should still
be faster than doing several thausand system calls... :)

By the way, I don't like working with the filesystem method... It
would be rather awkward to password protect area of the site just
to display the content of those files in the result of a search :)
(that's just one problem among others)

Yann S.

Received on Sat Dec 12 14:04:14 1998