At 06:41 AM 2/25/2002 -0800, Gaye Karagulle wrote:
>I am going to develop a library program in visual
>basic, that does indexing using "vector space model"
>and I need to find the words and their corresponding
>frequencies, of each document in my database, in order
>to create vectors for each document. And stemming
>should be done meanwhile, namely, "run" "runs" and
>"running"..etc should be counted as the same word. The
>word frequencies will be used as weigts in the
>can I create these document vectors using swish-e? if
Not sure I'm following what you want. Doesn't sound like you need a search
Do you need to find the documents or are you just interested in word
If just frequency then I'd probably just parse, stem, and tally up the
counts. Not sure why you would need swish.
With swish you can use some of the -T options to dump the index which will
probably give you word counts, I suppose. -T index_words_full will tell
you the frequency of each word, but it's a lot of output to parse.
Received on Mon Feb 25 14:58:07 2002