Skip to main content.
home | support | download

Back to List Archive

Using catdoc/xpdf in WinNT environment

From: Bertus Douma <BDOUMA(at)not-real.chn.nl>
Date: Fri Nov 30 2001 - 09:21:54 GMT
Hi all,

I do have a problem using filters in swishe; especially with catdoc. I'll
explain:

on our school intranet we use since 3 week swishe / simple web search
for searching on it. Great!! But...

To create the index I use 2 filters for indexing: 
1) xpdf port to windows for the pdf files. The trick to using it is in making
the appropriate bat-file to get rit of the single quotes you get with the
passing of the filename argument. Maybe there is an other solution but
I figured out this one:

the swish.conf file contains:
------------
# pdf en doc files worden apart behandeld:
#
FilterDir "c:/phpdev/www/swishe/filter-bin"
FileFilter  .pdf   pdffilter.bat
FileFilter  .doc   docfilter.bat
-------------

And pdffilter.bat:
-------------
:: /bin/sh
:: Adobe PDF filter
:: see: http://www.foolabs.com/xpdf/ 

::echo "----------------------"
::echo "parameter: ", %1

set NAAM=%1
::echo %1, %NAAM%
set NBBN= %NAAM:~1,-1%

c:\phpdev\www\swishe\filter-bin\pdftotext.exe %NBBN% - 
-------------

The clue is in the set statement...strip the first and last quote otherwise xpdf wil give a file not found error.

2) catdoc to convert word doc's. But it doesnt work well because it is limited to 8.3 filenames I discovered:
dosbox output: 
..
  test2.doc (8263 words)
  test3.doc (998 words)
  verslag16_10.doccatdoc: No such file or directory
 (49 words)
..

Trying to do a workaround by copiing the long-filename to a 8.3 temporary filename; converting it and than a deletion of it doesn't work:
in docfilter.bat:
---------------
set NAAM=%1
:: echo %1, %NAAM%
set NBBN= %NAAM:~1,-1%

copy/y %NBBN% tempdoc.doc
c:\phpdev\www\swishe\filter-bin\catdoc.exe -s8859-1 -d8859-1 tempdoc.doc
del/q tempdoc.doc 
----------------

dosbox output:
..
  test2.doccatdoc: No such file or directory
Could Not Find C:\phpdev\www\swishe\tempdoc.doc
 (67 words)
  test3.doccatdoc: No such file or directory
Could Not Find C:\phpdev\www\swishe\tempdoc.doc
 (67 words)
  verslag16_10.doccatdoc: No such file or directory
Could Not Find C:\phpdev\www\swishe\tempdoc.doc
 (70 words)
..

It looks like that the copy statement does not do his job. But why. And is there an alternative. Can anyone help.
I've tried and tried and tried; I am non-unix (ai; sorry but it didnt cross my  path) person (apple/pc); and just a bit of an end-user of this stuff.

Your sincerily,

Bertus Douma
Webmaster Chr. Highschool Nothern Netherlands (CHN)
The Netherlands
Received on Fri Nov 30 09:23:14 2001