Skip to main content.
home | support | download

Back to List Archive

wvare DOC -> HTML filter

From: Job, James <JJob(at)not-real.ESD.WA.GOV>
Date: Sun May 23 2004 - 03:33:46 GMT
Here is what it took (on Win2003 Server using SWISH-E 2.4.2):

1.  Download & Install "Complete package, except sources" (setup) from
http://gnuwin32.sourceforge.net/packages/wv.htm (2mb- Jan 2004).
2.  Add "C:\Program Files\GnuWin32\bin" to your system PATH.
3.  Create Doc2html.pm in c:\swish-e\lib\swish-e\perl\SWISH\filters\ as
follows (a quickly hacked Doc2txt.pm):

#===========================================================================
=====
package SWISH::Filters::Doc2html;
use vars qw/ @ISA $VERSION /;

$VERSION = '0.01';
@ISA = ('SWISH::Filter');
sub new {
    my ( $pack, %params ) = @_;

    my $self = bless {
        name => $params{name} || $pack,
    }, $pack;


    # check for helpers
    for my $prog ( qw/ wvware / ) {
        my $path = $self->find_binary( $prog );
        unless ( $path ) {
            $self->mywarn("Can not use Filter $pack -- need to install
$prog");
            return;
        }   
        $self->{$prog} = $path;
    }

    return $self;

}

sub name { $_->{name} || 'unknown' };



sub filter {
    my ( $self, $filter) = @_;

    # Do we care about this document?
    return unless $filter->content_type =~
m!(application/x-msword|application/msword)!;

    # We need a file name to pass to the wvware program
    my $file = $filter->fetch_filename;
    
    # Grab output from running program
    my $content = $filter->run_program( $self->{"wvware"}, "-1", "$file" );
    return unless $content;

    # update the document's content type
    $filter->set_content_type( 'text/html' );

    # return the document
    return \$content;
}
1;

__END__

=head1 NAME

SWISH::Filters::Doc2html - Perl extension for filtering MSWord documents
with Swish-e

=head1 DESCRIPTION

This is a plug-in module that uses the "wvware" program to convert MS Word
documents
to HTML for indexing by Swish-e.  "wvware" can be downloaded from:

    http://www.wvware.com

The program "wvware" must be installed and in your PATH before running
Swish-e.
This has been tested only under Win32- binary package from
http://gnuwin32.sourceforge.net/packages/wv.htm

=head1 SEE ALSO

L<SWISH::Filter>
#===========================================================================
=====

Test it on an unsuspecting .DOC file using the swish-filter-test script
(with "-content" option).
Note the "-1" command line option.  This prevents the WVWARE program from
creating WMF files in your system.

Wvware seems real good at translating .doc to HTML (or XML with options).
After using it at the command line (redirecting output to doc.htm), I was
impressed with the test docs I threw at it.

James Job, MCSE, MCP+I
Washington State Employment Security Department
Webmaster



*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Sat May 22 20:33:48 2004