I think the good and integrated solution would be to add one information to the index, for each word in a document : it's position in the document (word counting from the beginning).
This requires modifying the indexing process, and using this information (if needed), during search : a phrase search implies to search for words between which the distance in the found documents is equal to 1.
This can be extended to a proximity search : search words between which the distance is not more than N. The word position in the document, which will be stored in the index, can be expressed as the word position in the full doc, or the phrase position in the doc + word position in the phrase, or paragraph position + phrase position + word position (which implies to have a good way to detect phrase/paragrah breaks).
All this could multiply by almost 2 the size of the file information in the index, but would allow a more demanded search possibility.
I'm exposing all this, because many questions are coming, related to phrase search, and I knwo my users will definitively ask for this possibility by 6 months...
De: email@example.com [SMTP:firstname.lastname@example.org]
Date: mercredi 16 février 2000 18:38
À: Multiple recipients of list
Objet: [SWISH-E] RE: Exact Phrase search
You can't do it formally. You will have to post process the result. It's
as if you added a new filter that identifies matching documents. Swish-e
will already have narrowed down the search though.
Suppose you look for "Black Cat". You have swish-e look for black and cat
(not case sensitive).
Out of the resulting pages, you will have to do a new search on each
document to match "Black Cat".
That mean that all occurrences of black that do not match the case or not
close to each other will have to be removed. Then you will show the
Finally, notice that an exact match should not find a search sentence
broken on 2 lines.. but you may wish to find it though.
From: email@example.com [mailto:firstname.lastname@example.org]
Sent: 16 février 2000 17:58
Subject: [SWISH-E] Exact Phrase search
I have one big problem, as you know Swish-e doesn't search
Exact phrases, but I need it necessarily....
Has anybody done Exact Phrase Search???? Or any similar????
<< Fichier: Stephane Meier (E-mail).vcf>>
Received on Wed Feb 16 13:09:19 2000