Bill Moseley has been working on some changes to swish-e, and
I previously took a stab at preventing strcpy/sprintf overruns
by passing and limiting string length, so we put our heads
together and merged our changes.
I extracted Bill's comments from the source (see below), but as
I understand it the big changes are recursive stemming and to
keep stemming from returning an empty string, plus he has added
a new option for handling of meta tags.
My changes were driven by my sysadmin, who would not let me
install it "as is" because of some security holes he perceived.
I've fully commented all of the files I changed, which include
the README and many of the source files. (Those on the CC list
have seen my changes before, but there was one problem with
the original release which core dumped occasionally... please
replace that source with the current version if you still have it.)
The main goal was to "protect" sprintf, strcat, and strcpy from
buffer overflow. They all have buffer size checks or use
length-limited replacement routines. This required adding args
to a couple of routines (like Stem())... most of the strcpy's
were fairly safe, but strcat in a loop is a scary thing.
In addition, I added a missing "#include" to index.c (which has
been discussed several times on the email list but never done)
to avoid compiler warnings about function prototypes.
While I was at it, I noticed that the Makefile jammed the
CFLAGS variable... so I fixed that to respect the setting
at the top and THEN fixed all the source so "gcc -Wall" runs
clean (no warnings at all). This involved touching many files
in minor ways, but it WAS WORTH IT! I found several places
where "|" was used instead of "||", etc.
Finally, and perhaps this will generate the most concern, I put
some #ifdefs in swish.c and added a new Makefile target. You can
now build a "read-only" version of swish-e, called swish-search,
that can read index files but cannot write them. This makes for
a slightly safer environment, where even if a hacker gets past
your CGI he cannot cause the program to attempt writing files.
I didn't do a whiz-bang job, separating source files and all that,
but the CALLS TO routines which merge and create index files
should be gone if you #define INDEX_READ_ONLY (even if some of
the routines are still linked in).
I couldn't find any discussion of what constitutes a major, minor,
or development version change. I didn't change the INDEXHEADER
version number, but I added my initials to INDEXVERSION and VERSION
as an indicator that this is not an official release.
The revised source, which builds on Solaris 2.7 and FreeBSD, is at
Thanks for helping create this tool. I hope my changes help you also.
Let me know what should happen next! (as far as I know, my first
pass changes never made it to the official release area)
(Bill Moseley's comments are below)
#define REQMETANAME 1
/* Set to 1 to not index any Meta tag contents unless the tag name is
** listed in the MetaNames parameter. Set to 0 and with OKNOMETA set 1
** Swish will place META contents in index with no metaName attached.
** 10/11/99 Bill Moseley
** Added wordchars and ignorefirst and last chars to header of index file
** 11/23/99 - Bill Moseley
** 10/10/99 & 11/23/99 - Bill Moseley
** - Changed to stem words *before* expanding with expandstar
** so can find words in the index
** - Moved META tag check before expandstar so META names don't get
** Stem returns original word if words stems to empty string
** Bill Moseley 10/11/99
** Repeats stemming until word will stem no more
** Bill Moseley 10/17/99
** function: EndsWithCVC patched a bug Moseley 10/19/99
Received on Wed Feb 23 14:09:02 2000