At 04:23 AM 11/6/2001 -0800, Klaus Hollenbach wrote:
>FilterDir C:/path/to/perl/script
>FileFilter .pdf pdftotext.pl
>IndexDir C:/test/swish
(I tend to just use a full path in the FileFilter and not use FilterDir.)
So, can you use !# in Win32? Or do you have to say:
FilterDir .pdf "perl pdftotext.pl %p"
>--- perl script begin ---
>#!d:/programme/perl/bin/perl.exe
>$Program= "path/to/program/pdftotext.exe";
># remove single quotes form parameter (1)
>$Input = $ARGV[0];
>$Input =~ s/\'//g;
Don't need to do that now. For debugging I'd do:
print STDERR "Input file:'$Input'\n";
>(Swish passes the filname to the associated program/script in single )
>(quotes which gets misinterpreted by pdftotext. Unfortunately I )
>(couldn't change the default values of the FileFilter-Directive using )
>(something like )
>(--- )
>(FileFilter .pdf pdftotext.exe "%p -" )
OH, so all your perl program is doing is calling pdftotext? I hope the
documentation is somewhat clear that calling a perl program just to run a
program will really slow down indexing. Just call the program.
That looks like the right command, but maybe pdftotext isn't in your path?
>(this produces "err: FileFilter requires two values" )
You need to upgrade swish, as my version doesn't say that.
This is on linux:
> cat c
FileFilter .pdf pdftotext "%p -"
> ./swish-e -c c -i /usr/X11R6/lib/X11/xfig/xfig.pdf
Indexing Data Source: "File-System"
Indexing "/usr/X11R6/lib/X11/xfig/xfig.pdf"
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 2195 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
2195 unique words indexed.
4 properties sorted.
1 file indexed. 169502 total bytes.
Elapsed time: 00:00:01 CPU time: 00:00:00
Indexing done!
Bill Moseley
mailto:moseley@hank.org
Received on Tue Nov 6 13:53:49 2001