Skip to main content.
home | support | download

Back to List Archive

RE: Filtering

From: Roubart Capcap <RCapcap(at)>
Date: Fri Jul 25 2003 - 23:12:45 GMT
The problem is, on some files the does not seem to work but the xls2csv works.  Is there a way to filter using xls2csv if the XLtoHTML did not work.

-----Original Message-----
From: Bill Moseley []
Sent: Friday, July 25, 2003 12:51 PM
To: Roubart Capcap
Cc: Multiple recipients of list
Subject: Re: [SWISH-E] Filtering

On Fri, Jul 25, 2003 at 12:24:37PM -0700, Roubart Capcap wrote:
> I am planning to add xl2csv as another filter to parse MS Excel files besides the filter.  I copied and made it with the following changes:
> package SWISH::Filters::xls2csv;
> use vars qw/ %FilterInfo $VERSION /;
> $VERSION = '0.01';
> %FilterInfo = (
>     type     => 2,  # normal filter
>     priority => 50, # normal priority 1-100
> );
> sub filter {
>     my $filter = shift;
>     # Do we care about this document?
>     return unless $filter->content_type =~ m!application/!;
>     # We need a file name to pass to the xls2csv program
>     my $file = $filter->fetch_filename;
>     # Grab output from running program
>     my $content = $filter->run_program( 'xls2csv', $file );
>     # update the document's content type
>     $filter->set_content_type( 'text/plain' );
> How and where do I specify that xls files should be parsed by both
> filters.  

Both filters?  If you are converting to csv then you wouldn't want the 
other to Excel filter to process it, would you?

Anyway, the type and priority are what set the sort order of the 
filters.  If you have a filter where you still want other filters to 
process it instead of finishing after your filter you call 
$filter->set_continue.  (All this if from looking at the docs, since I 
can't remember how it works....)

>And how do I specify that the output of xls2csv should be
> parsed by the TXT2 parser?

The way swish works normally is by mapping file extensions to the 
parser.  That's not a very good way to go, of course.  Someday I'll add 
processing by content-type internal to swish (or that's been the plan 
for a while).  But if using -S prog you can set the parser in a header.

I see this in

    # Set the parser type if specified by filtering
    if ( my $type = delete $server->{parser_type} ) {
        $headers .= "Document-Type: $type\n";

    } elsif ( $response->content_type =~ m!^text/(html|xml|plain)! ) {
        $type = $1 eq 'plain' ? 'txt' : $1;
        $headers .= "Document-Type: $type*\n";

So it's setting a Document-Type: header to select the parser.

Does that help?

Bill Moseley
Received on Fri Jul 25 23:12:55 2003