On Tue, Mar 16, 2004 at 03:20:41PM -0800, Lung.Allen wrote:
>
>
> What does running:
>
> ./howto-pdf-prog.pl | head
>
> show?
>
> ./howto-pdf-prog.pl | head
> SCALAR(0x80779a8)SCALAR(0x80a91d4)SCALAR(0x804ca20)SCALAR(0x80a5b10)SCALAR(0x80a9204)SCALAR(0x80a92dc)SCALAR(0x80a
Looks like you have a program that is not working correctly.
Do you have any experience with Perl?
I copied this from the Linux article (adding the use lib path) and it seems to work ok.
Is this the script you are using?
eley@bumby:~$ cat x.pl
#!/usr/bin/perl -w
use lib '/home/moseley/123/swish-e-2.4.1/prog-bin';
use pdf2xml;
my @files =
`find /usr/share/cups/doc-root -name 'i*.pdf' -print`;
for (@files) {
chomp();
my $xml_record_ref = pdf2xml($_);
# this is one XML file with a SWISH-E header
print $$xml_record_ref;
}
moseley@bumby:~$ perl x.pl | swish-e -S prog -i stdin -T properties
Indexing Data Source: "External-Program"
Indexing "stdin"
swishdocpath: 6 ( 35) S: "/usr/share/cups/doc-root/ja/idd.pdf"
swishtitle: 7 ( 33) S: "CUPS Interface Design Description"
swishdocsize: 8 ( 4) N: "43274"
swishlastmodified: 9 ( 4) D: "2003-11-14 08:31:42 PST"
swishdocpath: 6 ( 35) S: "/usr/share/cups/doc-root/ja/ipp.pdf"
swishtitle: 7 ( 26) S: "CUPS Implementation of IPP"
swishdocsize: 8 ( 4) N: "70011"
swishlastmodified: 9 ( 4) D: "2003-11-14 08:31:43 PST"
swishdocpath: 6 ( 32) S: "/usr/share/cups/doc-root/idd.pdf"
swishtitle: 7 ( 33) S: "CUPS Interface Design Description"
swishdocsize: 8 ( 4) N: "43274"
swishlastmodified: 9 ( 4) D: "2004-03-05 04:00:48 PST"
swishdocpath: 6 ( 32) S: "/usr/share/cups/doc-root/ipp.pdf"
swishtitle: 7 ( 26) S: "CUPS Implementation of IPP"
swishdocsize: 8 ( 4) N: "70011"
swishlastmodified: 9 ( 4) D: "2004-03-05 04:00:48 PST"
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 1,564 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
1,564 unique words indexed.
4 properties sorted.
4 files indexed. 226,570 total bytes. 30,530 total words.
Elapsed time: 00:00:02 CPU time: 00:00:01
Indexing done!
--
Bill Moseley
moseley@hank.org
Received on Tue Mar 16 15:42:09 2004