Yesterday I port swish-e to DJGPP whith minor support of 8bit character
encoding
Some comments:
1) For use with Win95 long names you must set environment variable LFN=y
2) CFLAGS= -O2 -funsigned-char
This compilation flag solve the problem whit comparision 8bit
characters,
3) #include <fcntl.h>
_fmode=O_BINARY;
This lines force DJGPP IO library open files in standart UNIX mode
( default is DOS text whith <LF> - <LF><CR> translation)
This solve many problems incude (IMHO) described in
http://sunsite.berkeley.edu/SWISH-E/Ports/Windows/message2 (integers stored
with fputc)
4) setlocale(LC_ALL,"");
Correct translation UPPER -> LOWER for national alphabets
5) #define VOWELCHARS
char indexchars[257]=WORDCHARS;
Support for national languages
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
This part of swish-e need more changes:
My version of VOWELCHARS implement russian vowels in CP866 encoding, many
languages, I think,
need such additional configurable table.
indexchars ? I dont understand difference between indexchars and WORDCHARS
if logycaly they indentical
please answer me.
While definition of indexchars not included in config.h I spent about 30 min
trying understend why swish-e dont work whith russian language.
In any case it would be more elegant create table whith char property like
this
#define SW_WRD 0x01
#define SW_BEG 0x02
#define SW_END 0x04
#define SW_VWL 0x08
.....................
char sw_CHARS[]=
{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0....... /* 0x00 - 0x1F */
.............
SW_WRD | SW_BEG | SW_END | SW_VWL, /* 0x41 a */
.....................
SW_WRD | SW_BEG | SW_END | SW_VWL, /* 0xFF russian ya */
}
#define isvowel(c) ( sw_CHARS[(c)] & SW_VWL )
...........................................
Additionaly it is interesting move language dependent information like char
classes and stopwords
in external config file.
This is my minor changes (I remove sw_CHARS table described above whith
support of russian, and all changes connected whith it)
----------------------------------------------------swish-e.dif-------------
-------------------------------------
diff srcdos8/Makefile src/Makefile
13c13
< CFLAGS= -O2 -funsigned-char
---
> CFLAGS= -O2
diff srcdos8/check.c src/check.c
175,178c175,176
< int i;
< for (i = 0; VOWELCHARS[i] != '\0'; i++)
< if (c == VOWELCHARS[i])
< return 1;
---
> if (c == 'a' || c == 'e' || c == 'i' || c == 'o' || c == 'u')
> return 1;
diff srcdos8/config.h src/config.h
125,126d124
< #define VOWELCHARS "aeiou ?³RÇÌÎÏÐ"
<
diff srcdos8/file.c src/file.c
47d46
< #ifndef DJGPP
53,55d51
< #else
< return 0;
< #endif
diff srcdos8/swish.c src/swish.c
36,39d35
< setlocale(LC_ALL,"");
< #ifdef DJGPP
< _fmode=O_BINARY;
< #endif
diff srcdos8/swish.h src/swish.h
21,24d20
< #ifdef DJGPP
< #include <fcntl.h>
< #endif
< #include <locale.h>
186c182
< char indexchars[257]=WORDCHARS;
---
> char *indexchars =
"abcdefghijklmnopqrstuvwxyz&#;0123456789_\\|/-+=?!@$%^'\"`~,.<>[]{}";
----------------------------------------------------end
swish-e.dif--------------------------------------------------
Received on Sun Apr 12 23:05:08 1998