I am having many little problems with 2.4.3 on Windows. I have ActivePerl
5.8.8. Any assistance is most welcome. I apologize in advance if I missed
anything obvious.
Thanks!
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
1. Accented characters do not get translated properly
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
My cfg is as follow:
IndexFile hrtool.index
IndexDir ../docs
IndexOnly .doc
FileFilter .doc ./lib/swish-e/swish_filter.pl '"%p" "%P"'
TranslateCharacters :ascii7:
A word spelled "montr=E9al" gets converted to "montrcal", as shown by -T
INDEXED_WORDS.
Adding:[7:swishdefault(1)] 'montrcal' Pos:2 Stuct:0x9 ( BODY FILE =
)
Adding:[7:swishdefault(1)] 'montrcal' Pos:3 Stuct:0x9 ( BODY FILE =
)
Other accented letters produce similar odd results.
I tried both options, none helps:
TranslateCharacters :ascii7:
#TranslateCharacters =E9 e
If I omit TranslateCharacters, words get cut at accented letters position.
"Montr=E9al" becomes two words: "montr" and "al".
I need to be able to parse english and french documents. I don't mind
"loosing" accented letters during indexing, in fact I was quite happy when =
I
read swish could do that for me.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
2. Can't locate object method "filter" via package "SWISH::Filter"
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
I am running the following command:
swish-e -c ..\hrtool.cfg -S prog
swish_filter reports an error and nothing gets indexed.
Can't locate object method "filter" via package "SWISH::Filter"
My cfg is as follows:
IndexFile hrtool.index
IndexDir swish_filter.pl
SwishProgParameters ../docs
TranslateCharacters :ascii7:
In 2.4.2, I have a different error:
Indexing Data Source: "External-Program"
Indexing "swish_filter.pl"
External Program found: C:\phil\pgms\swish\swish-
2.4.2\lib\swish-e/swish_filter.pl
Use of uninitialized value in concatenation (.) or string at
C:\phil\pgms\swish\swish-2.4.2\lib\swish-e\perl/SWISH/Filter.pm line 341.
Failed to set content type for file reference ''Use of uninitialized value
in concatenation (.) or string at C:\phil\pgms\swish\swish-
2.4.2\lib\swish-e\swish_filter.pl line 53.
- Not filtered: (../docs)
Use of uninitialized value in print at C:\phil\pgms\swish\swish-
2.4.2\lib\swish-e\swish_filter.pl line 56.
Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!
.
With a different config, swish_filter will work under 2.4.2 (but never unde=
r
2.4.3):
IndexFile hrtool.index
IndexDir ../docs
IndexOnly .doc
FileFilter .doc ./lib/swish-e/swish_filter.pl '"%p" "%P"'
TranslateCharacters :ascii7:
But needless to say, I'd prefer not to have to define individual filters.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
3. Systematic error on PDF files: "May not be a PDF file"
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
The complete error is the following:
Error: May not be a PDF file (continuing anyway)
Error (0): PDF file is damaged - attempting to reconstruct xref table...
Error: Couldn't find trailer dictionary
Error: Couldn't read xref table
I obtain this error with PDF generated by OpenOffice or a PDF printer in
Windows.
Options I use to parse them are the following:
IndexOnly .pdf
FileFilter .pdf ./lib/swish-e/pdftotext.exe '"%p" "%P"'
*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Fri Sep 22 16:47:34 2006