Rainer wrote:
>If you don't want to use these "features" a search engine may offer some
>switches, that a description or the first 100 words will not be put in
>the database...
>
>Doing such things via perl is not the way to go, because it's to slow
I use perl to do precisely this on two web sites I manage, and it's not slow.
Have a look at the seach facilities on http://www.republic.org.au and
http://www.ausflag.com.au. The perl routine parses the Swish-E output and
extracts 100 word citations from the files so referenced, and nicely formats
the output AltaVista style. As the referenced documents are on the local file
system, it's very quick.
These two web sites get around 50k hits per week, and have peaked at over 300k
hits in one day, and I've never had a complaint that the search engine is
slow.
Putting 100 word citations in the index is unnecessary. It would duplicate
information already in the file system and greatly increase the size of the
index.
Cheers,
--
Dr Brendan Jones |
Visiting Fellow |
Electronics Department |
Macquarie University | Email: brendan@mpce.mq.edu.au
NSW 2109 AUSTRALIA | WWW : http://www.mpce.mq.edu.au/~brendan/
Received on Tue Dec 1 17:31:12 1998