> On Tuesday, July 15, 2003, at 04:19 AM, michelle jenkins wrote:
> > 1. The software must be able to count the occurrence
> > of each word in each record in a number of fields:
Yes, the index stores word frequency (and word position) based on
filename and metaname (field).
> > 2. The software must be able to count the record
> > occurrence (the total number of unique records that
> > contain each word).
Well, yes, that's what swish does. You search for a word and it tells
you the list of files that contain that word.
> > 3. The software must be able to identify frequently
> > occurring phrases (ideally including hyphenated words)
> > or word co-occurrence within records and fields
You can search for phrases, but it does that just by matching words
based on their word position. No pre-processing of phrases is done at
indexing time.
> > 4. The software must be able to allow the import of
> > MEDLINE records consisting of title, abstract, journal
> > and MeSH
No problem.
> > 5. The software must be able to remove stop words
> > at the user’s discretion
Yes.
> > Obviously I'm hoping to evaluate the packages myself
> > before deciding. Previous research has used WordStat,
> > the bibliographic software Idealist and SWISH in a
> > hpertext/fulltext environemnt. One of the major
> > limitations of these packages was their inability to
> > analyse phrases (mutli-term controlled vocabulary).
Swish has a "buzzword" feature that might work in some cases, although I
don't think you can use it for phrases that contain white space. It's
really more useful for words that contain characters that wouldn't
normally be indexed (e.g. C++).
--
Bill Moseley
moseley@hank.org
Received on Tue Jul 15 15:02:31 2003