Skip to main content.
home | support | download

Back to List Archive

Re: Phrase search

From: Jose Manuel Ruiz <jmruiz(at)not-real.boe.es>
Date: Wed Mar 29 2000 - 15:36:00 GMT
I have made some minor changes in swish-e to store 
word positions and frequency in the index file.

This changes affect three files:

index.c -> Main changes
index.h -> Few changes on some protoypes
swish.h -> Structure location modified.

Now, the index file gets bigger becouse the frequency
and the position of the word in the file is stored on it.
The word group is stored in the following way:

this: 1 7812 47 1 5 5 10 15 21 31

Numbers 1th to 4th are the old values. 5th is 
the word frequency and the rest are the word positions

Now comes the hard part to code:
- How to search for an exact phrase? I think that searching
in the same way freewais-sf does could be a good idea (using
the character ' as delimiter). For example: 'Berkeley University'.
Any more ideas?
- If the position is stored, it is possible to search for a word
which is n positions "near". For example: Berkeley near[2] University
will return Berkeley University and University of Berkeley.
Any more ideas? 

Both things require modifiing the parser so, any comment will
be apreciated.

Minor improvement:
- I think it is posible to get better compression if the
positions of the words are stored incrementally. For example:
Original sequence of positions: 25 366 598 2345 2500
Incremental sequence of positions: 25 341 232 1747 155

I would like to know if this work can be useful to more 
people.

Have a nice day 

Jose Ruiz

jmruiz@boe.es

Jefe de Area Informatica
Boletin Oficial del Estado
Manoteras 54
Madrid 28050
Spain
Received on Wed Mar 29 10:37:02 2000