By the way:
On Fri, Dec 02, 2005 at 03:29:45AM -0800, David Larkin wrote:
>
> #!/usr/bin/perl -w
> use pdf2xml;
> my @files =
> `find ./pdf/ -name '*.pdf' -print`;
> for (@files) {
> chomp();
> my $xml_record_ref = pdf2xml($_);
> # this is one XML file with a SWISH-E header
> print $$xml_record_ref;
> }
>
> I've tried to build an eqiuvelent for word docs, I came up with
>
> #!/usr/bin/perl -w
>
> my @files =
> `find ./doc/ -name '*.doc' -print`;
> for (@files) {
> chomp();
> my $xml_record_ref = exec "/usr/local/bin/catdoc $_";
> # this is one XML file with a SWISH-E header
> print $$xml_record_ref;
> }
> Warning: Unknown header line: 'Cricket Roundup' from program ./howto-doc-prog.plerr: External program failed to return required headers Path-Name:
Note:
> my $xml_record_ref = exec "/usr/local/bin/catdoc $_";
> print $$xml_record_ref;
You are just feeding the content to swish. How will swish know where
one doc ends and the next one starts? Or each docs's file name?
I suspect if you ran these from the command line you would see the
difference. The -S prog method needs headers to know what the file
name is and how long it is.
Use SWISH::Filter.
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Fri Dec 2 05:10:32 2005