Hello,
according to the somewhat ambiguous explanation in
http://sunsite.berkeley.edu/SWISH-E/manual.html
# Can I index 8-bit text?
# Yes, if the text uses the HTML equivalents for the ISO-Latin-1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# (ISO8859-1) character set. Upon indexing swish-e will convert
# all numbered entities it finds (such as ©) to named
# entities (such as ©). To search for words including
# these codes, type the named entity (if it exists) in place of
# the 8-bit character. Swish will also convert entities to
# ASCII equivalents, so words that might look like this in HTML:
# resumé can be searched as this: resume.
the Swish-e can _solely_ be used for indexing of files
containing HTML entities, ie. the 7-bit equivalents of
8-bit Latin-1 text; ergo it can be used FOR 7-BIT ONLY,
*NOT* FOR 8-BIT TEXT FILES. I'm not quite sure why there
should be such a restriction, or else my reading of the
above is all wrong.
But what about other 8-bit character sets for which there
are no standartized HTML equivalents, like the ISO-8859-2
(Latin-2) alphabet? Would Swish-e index these correctly
-- also possibly with a custom 'IgnoreWords' directive or
stopwords.conf file ? If Swish-e cannot be used, does anyone
know of a suitable equivalent freeware solution for FreeBSD ?
Please R)eply with Cc: ianf@random.se
Thanks much in advance.
__Ian
Received on Sun Mar 1 14:04:02 1998