Skip to main content.
home | support | download

Back to List Archive

Filters on WinNT

From: Klaus Hollenbach <hollenbach(at)not-real.scholze.de>
Date: Tue Nov 06 2001 - 12:24:20 GMT
I'm having a problem using a filter.

I'm running Swish-e 2.1 dev 20 for Windows on WinNT 4.0.
(http://www.webaugur.com/wares/files/swish-e-2.1-dev-20-win32.zip)
I tried to index a pdf-File "C:/test/swish/manual.pdf" using the
FileFilter-Directive in my config-file. 

--- user.config begin ---
FilterDir C:/path/to/perl/script
FileFilter .pdf pdftotext.pl
IndexDir C:/test/swish
--- user.config end ---

pdftotext.pl looks like this:

--- perl script begin ---
#!d:/programme/perl/bin/perl.exe
$Program= "path/to/program/pdftotext.exe";
# remove single quotes form parameter     (1)
$Input = $ARGV[0];
$Input =~ s/\'//g;
# run pdftotext
open(CONVERT,"|$Program $Input -") || die;
close(CONVERT);
--- perl script end ---

(1)
(Swish passes the filname to the associated program/script in single  )
(quotes which gets misinterpreted by pdftotext. Unfortunately I       )
(couldn't change the default values of the FileFilter-Directive using )
(something like                                                       )
(---                                                                  )
(FileFilter .pdf pdftotext.exe "%p -"                                 )
(---                                                                  )
(this produces "err: FileFilter requires two values"                  )


When indexing with "swishe -c conf/user.config", the pdf-file seems not 
to get indexed. I receive the following output on the command line.

--- swishe output begin ---
Indexing Data Source: "File-System"
Indexing C:/test/swish..

Checking dir "C:/test/swish"...
  Handbuch.pdf - Using DEFAULT filter -  (no words)

Removing very common words...
DBG: In removestops
no words removed.
Writing main index...
Writing header ...
Writing index entries ...
Sorting Words alphabetically
Writing stopwords ...
no unique words indexed.
Writing file index...
Writing file list ...
DBG: Starting sorting of properties
DBG: End sorting of properties
Writing file offsets ...
Writing MetaNames ...
Writing Location lookup tables ...
Writing offsets (2)...
1 file indexed.
Running time: 1 second.
Indexing done!
--- swishe output end ---

The output from pdftotext seems not to make its way to swish.
Any help about getting this to work would be gratefully appreciated.

-- 
Klaus Hollenbach
SCHOLZE Ingenieurgesellschaft mbH
E-Mail:  hollenbach@scholze.de
Received on Tue Nov 6 12:25:00 2001