Skip to main content.
home | support | download

Back to List Archive

RE: Formatting the output from Swish-E

From: Bas Meijer <bas(at)not-real.antraciet.nl>
Date: Wed Dec 06 2000 - 10:15:50 GMT
Hi,


Lookup (a swish-e 1.3.3 based searchengine in perl) includes a HTML 
parser module (perl) for extracting the first 300 bytes from the text 
of a HTML page.
Lookup extracts at search-time from the file system, There is another
approach to have a modified spider: http://www.lhsc.on.ca/swish-e/
Storing the abstracts in a GDBM file. (more efficient)

Lookup at http://bas.antraciet.nl/lookup





Bas Meijer


>Has anyone done a HTML library for outputting and parsing HTML documents?
>
>>  -----Original Message-----
>>  From: swish-e@sunsite.berkeley.edu
>>  [mailto:swish-e@sunsite.berkeley.edu]On Behalf Of Luke Ross
>>  Sent: Wednesday, 6 December 2000 04:18
>>  To: Multiple recipients of list
>>  Subject: [SWISH-E] RE: Formatting the output from Swish-E
>>
>>
>>  Hi
>>
>>  On Sun, 3 Dec 2000, Patrick Dunford wrote:
>>
>>  > A third option might be to have PHP parse each returned file
>>  and extract the
>>  > HTML from the file... haven't looked at this in detail but
>>  theoretically it
>>  > might be possible.
>>
>>  I looked at this, but it was nigh-on impossible for server-parsed and
>>  included files :)
>>
>>  Regards,
>>
>>  Luke
>>  --
>>  Luke Ross (Fizzy Razzer) - lukeross@sys3175.co.uk
>>  Visit http://lcr.sys3175.co.uk for geek code, other addresses,
>>  web page etc.
>>
>>

-- 


--  /'''     Bas Meijer, Antraciet
     c-OO     WEB: http://bas.antraciet.nl WAP: http://wmpp.net
     \  >     Kerkstraat 19 Postbus 256 1400 AG Bussum.NL
      \&&     tel. +31 35 7502100  fax. +31 35 7502111
Received on Wed Dec 6 10:18:24 2000