Hi all,
I do have a problem using filters in swishe; especially with catdoc. I'll
explain:
on our school intranet we use since 3 week swishe / simple web search
for searching on it. Great!! But...
To create the index I use 2 filters for indexing:
1) xpdf port to windows for the pdf files. The trick to using it is in making
the appropriate bat-file to get rit of the single quotes you get with the
passing of the filename argument. Maybe there is an other solution but
I figured out this one:
the swish.conf file contains:
------------
# pdf en doc files worden apart behandeld:
#
FilterDir "c:/phpdev/www/swishe/filter-bin"
FileFilter .pdf pdffilter.bat
FileFilter .doc docfilter.bat
-------------
And pdffilter.bat:
-------------
:: /bin/sh
:: Adobe PDF filter
:: see: http://www.foolabs.com/xpdf/
::echo "----------------------"
::echo "parameter: ", %1
set NAAM=%1
::echo %1, %NAAM%
set NBBN= %NAAM:~1,-1%
c:\phpdev\www\swishe\filter-bin\pdftotext.exe %NBBN% -
-------------
The clue is in the set statement...strip the first and last quote otherwise xpdf wil give a file not found error.
2) catdoc to convert word doc's. But it doesnt work well because it is limited to 8.3 filenames I discovered:
dosbox output:
..
test2.doc (8263 words)
test3.doc (998 words)
verslag16_10.doccatdoc: No such file or directory
(49 words)
..
Trying to do a workaround by copiing the long-filename to a 8.3 temporary filename; converting it and than a deletion of it doesn't work:
in docfilter.bat:
---------------
set NAAM=%1
:: echo %1, %NAAM%
set NBBN= %NAAM:~1,-1%
copy/y %NBBN% tempdoc.doc
c:\phpdev\www\swishe\filter-bin\catdoc.exe -s8859-1 -d8859-1 tempdoc.doc
del/q tempdoc.doc
----------------
dosbox output:
..
test2.doccatdoc: No such file or directory
Could Not Find C:\phpdev\www\swishe\tempdoc.doc
(67 words)
test3.doccatdoc: No such file or directory
Could Not Find C:\phpdev\www\swishe\tempdoc.doc
(67 words)
verslag16_10.doccatdoc: No such file or directory
Could Not Find C:\phpdev\www\swishe\tempdoc.doc
(70 words)
..
It looks like that the copy statement does not do his job. But why. And is there an alternative. Can anyone help.
I've tried and tried and tried; I am non-unix (ai; sorry but it didnt cross my path) person (apple/pc); and just a bit of an end-user of this stuff.
Your sincerily,
Bertus Douma
Webmaster Chr. Highschool Nothern Netherlands (CHN)
The Netherlands
Received on Fri Nov 30 09:23:14 2001