Here is what it took (on Win2003 Server using SWISH-E 2.4.2):
1. Download & Install "Complete package, except sources" (setup) from
http://gnuwin32.sourceforge.net/packages/wv.htm (2mb- Jan 2004).
2. Add "C:\Program Files\GnuWin32\bin" to your system PATH.
3. Create Doc2html.pm in c:\swish-e\lib\swish-e\perl\SWISH\filters\ as
follows (a quickly hacked Doc2txt.pm):
#===========================================================================
=====
package SWISH::Filters::Doc2html;
use vars qw/ @ISA $VERSION /;
$VERSION = '0.01';
@ISA = ('SWISH::Filter');
sub new {
my ( $pack, %params ) = @_;
my $self = bless {
name => $params{name} || $pack,
}, $pack;
# check for helpers
for my $prog ( qw/ wvware / ) {
my $path = $self->find_binary( $prog );
unless ( $path ) {
$self->mywarn("Can not use Filter $pack -- need to install
$prog");
return;
}
$self->{$prog} = $path;
}
return $self;
}
sub name { $_->{name} || 'unknown' };
sub filter {
my ( $self, $filter) = @_;
# Do we care about this document?
return unless $filter->content_type =~
m!(application/x-msword|application/msword)!;
# We need a file name to pass to the wvware program
my $file = $filter->fetch_filename;
# Grab output from running program
my $content = $filter->run_program( $self->{"wvware"}, "-1", "$file" );
return unless $content;
# update the document's content type
$filter->set_content_type( 'text/html' );
# return the document
return \$content;
}
1;
__END__
=head1 NAME
SWISH::Filters::Doc2html - Perl extension for filtering MSWord documents
with Swish-e
=head1 DESCRIPTION
This is a plug-in module that uses the "wvware" program to convert MS Word
documents
to HTML for indexing by Swish-e. "wvware" can be downloaded from:
http://www.wvware.com
The program "wvware" must be installed and in your PATH before running
Swish-e.
This has been tested only under Win32- binary package from
http://gnuwin32.sourceforge.net/packages/wv.htm
=head1 SEE ALSO
L<SWISH::Filter>
#===========================================================================
=====
Test it on an unsuspecting .DOC file using the swish-filter-test script
(with "-content" option).
Note the "-1" command line option. This prevents the WVWARE program from
creating WMF files in your system.
Wvware seems real good at translating .doc to HTML (or XML with options).
After using it at the command line (redirecting output to doc.htm), I was
impressed with the test docs I threw at it.
James Job, MCSE, MCP+I
Washington State Employment Security Department
Webmaster
*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Sat May 22 20:33:48 2004