Skip to main content.
home | support | download

Back to List Archive

RE: Good Excel parser

From: Roubart Capcap <RCapcap(at)>
Date: Wed May 28 2003 - 22:43:21 GMT
If I download the excel file and test it, I come up with this:

[Bart]$ perl -I.. test adr03rates.xls
Testing mode for

File: adr03rates.xls
Content-type: application/excel



If I use the SWISH::Filter (with Spreadsheet::ParseExcel), it seems to try to parse it but with errors:

19796 Warning - http://localhost/2003/adr03rates.xls: substr
 outside of string at /usr/local/lib/perl5/site_perl/5.8.0/Spreadsheet/ParseExce line 1253.

19780 Warning - http://localhost/2003/adr03rates.xls: Use of
 uninitialized value in unpack at /usr/local/lib/perl5/site_perl/5.8.0/Spreadshe
et/ line 1253.

Summary for: http://localhost/2003/adr03rates.xls
    Skipped: 1  (0.0/sec)
Unique URLs: 1  (0.0/sec)

Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!

I am not sure if the ParseExcel module is causing the problem or not.  Please help.

-----Original Message-----
From: Bill Moseley []
Sent: Wednesday, May 28, 2003 11:00 AM
To: Roubart Capcap
Cc: Multiple recipients of list
Subject: Re: [SWISH-E] Good Excel parser

On Wed, May 28, 2003 at 09:00:19AM -0700, Roubart Capcap wrote:
> Hello,
> Does anybody know of a good Excel parser?  I tried the Swish Filters
> with the following code in my
> use lib '/swish-e-2.2.3/filters/SWISH/Filters';
> use XLtoHTML;
> sub xl {
>    my ( $uri, $server, $response, $content_ref ) = @_;
>    return 1 unless $response->content_type eq 'application/';
>    # for logging counts
>    $server->{counts}{'XLS transformed'}++;
>    $$content_ref = ${XLtoHTML( $content_ref )};
>    $$content_ref =~ tr/ / /s;
>    return 1;
> }

I assume you have Spreadsheet::ParseExcel installed?  I also don't know 
if you can call XLtoHTML() directly.  You should call it from 
SWISH::Filter.  See

There's also a "TESTING" section that shows how to test the filter 
outside of swish-e or

It says:

[This module can be run as a program directly. Change directory 
to the location of the module and run:

  perl -I.. test  foo.pdf  bar.doc

replace foo.pdf and bar.doc with real paths on your system. The -I.. is 
needed for loading the filter modules.]

You don't really have to change directory to the location of  
You can run from any directory.  For example

   perl -I/home/moseley/swish-e/filters \
        /home/moseley/swish-e/filters/ \

That should show you if the filtering is working.

BTW -- The new version of Swish-e has a filter (a SWISH::Filter) that
uses the Perl module Spreadsheet::ParseExcel (available from CPAN).  The
new will automatically use it if you have
Spreadsheet::ParseExcel installed.

Bill Moseley
Received on Wed May 28 22:43:28 2003