Skip to main content.
home | support | download

Back to List Archive

RE: Good Excel parser

From: Roubart Capcap <RCapcap(at)not-real.scif.com>
Date: Wed May 28 2003 - 16:03:15 GMT
Sorry, the code was in my spider.config not spider.pl

-----Original Message-----
From: Roubart Capcap 
Sent: Wednesday, May 28, 2003 9:00 AM
To: Multiple recipients of list
Subject: [SWISH-E] Good Excel parser


Hello,

Does anybody know of a good Excel parser?  I tried the Swish Filters with the following code in my spider.pl:

use lib '/swish-e-2.2.3/filters/SWISH/Filters';
use XLtoHTML;
sub xl {
   my ( $uri, $server, $response, $content_ref ) = @_;
   return 1 unless $response->content_type eq 'application/vnd.ms-excel';
   # for logging counts
   $server->{counts}{'XLS transformed'}++;
   $$content_ref = ${XLtoHTML( $content_ref )};
   $$content_ref =~ tr/ / /s;
   return 1;
}

I tried the above but most of the Excel documents were not indexed.

Roubart Capcap
Received on Wed May 28 16:03:21 2003