Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Problem with unique word

From: Bharatwaj Narayanan Iyengar <Bharatwaj_Iyengar(at)not-real.infosys.com>
Date: Mon May 31 2010 - 08:52:55 GMT
Hi
I hope this helps

In this document several search options will be explained in detail. The search tool is based on SWISH-E<http://swish-e.org/>. So SWISH-E and search Tool will be used interchangeably and all of these options belong to swish-e tool.
Boolean Operators
You can use the Boolean operators and, or, near or not in searching. Without these Boolean operators "Swish-e" will assume you're and'ing the words together. The operators are not case sensitive. These three searches are the same:
        Foo Bar
        bar foo
        foo and Bar
The not operator inverts the results of a search.
        not foo
finds all the documents that do not contain the word foo.
Parentheses can be used to group searches
        not (foo and bar)
The result is all documents that have none or one term, but not both
To search for the words and, or, near or not, place them in escaped double quotes:
        \"not\"
        \"near\"
Other examples:
        smilla or snow
Retrieves files containing either the words "smilla" or "snow".
        smilla snow not sense
        (smilla and snow) and not sense
The near keyword is similar to and but implies a proximity between the words. The near keyword takes a integer argument as well, indicating the maximum distance between two words to consider a valid match.
        smilla near5 snow
would match the document if the words smilla and snow appeared within 5 positions of one another.
A near search with no argument or argument of 0 is the same as an and search.
Wildcards
Two different wildcard characters are available, each evoking different behaviour.
The * means "match zero or more characters."
The ? means "match exactly one character."
The wildcard * may only be used at the end of a word. Otherwise * is considered a normal character (i.e. can be searched for if included in the WordCharacters directive).
Example:
        librarian
this query only retrieves files which contain the given word.
        librarian*
retrieves "librarians", "librarianship", etc. along with "librarian".
The ? wildcard matches exactly one character, but may not be used at the start of a word.
Example:
        s?ow
will match snow, slow and show but not strow
This:
        ?how
will throw an error.
Order of Evaluation
In general, the order of evaluation is not important. Internally swish-e processes the search terms from left to right. Parenthesis can be used to group searches together, effectively changing the order of evaluation. For example these three are the same:
        foo not bar baz
        not bar foo baz
        baz foo not bar
but these two are not the same:
        foo not bar baz
        foo not (bar baz)
The first finds all documents that contain both foo and baz, but do not contain bar. The second finds all that contain foo, and contain either bar or baz, but not both.
It is often helpful in understanding searches to use the boolean terms and parenthesis. So the above two become:
        foo AND (not bar) AND baz
        foo AND (not (bar AND baz))
These four examples are all the same search
        juliet not ophelia and pac
        juliet) AND (NOT ophelia) AND (pac)
        juliet not ophelia pac
        pac and juliet and not Ophelia
Looking at the the first three searches, first Swish-e finds all the documents with "juliet". Then it finds all documents that do not contain "ophelia". Those two lists are then combined with the boolean AND operator resulting with a list of documents that include "juliet" but not "ophelia". Finally, that list is ANDed with the list of documents that contain "pac" resulting.
However it is always possible to force the order of evaluation by using parenthesis. For example:
        juliet not (ophelia and pac)
retrieves files with "juliet" that do not contain both words "ophelia" and "pac".
Meta Tags
MetaNames are used to represent fields (called columns in a database) and provide a way to search in only parts of a document.
To limit a search to words found in a meta tag you prefix the keywords with the name of the meta tag, followed by the equal sign:
        metaname= word
        metaname= (this or that)
        metaname= ( (this or that) or "this phrase" )
It is not necessary to have spaces at either side of the "=", consequently the following are equivalent:
        metaName=word
        metaName = word
        metaName= word
To search on a word that contains a "=", precede the "=" with a "\" (backslash).
        test\=3 = x\=4 or y\=5
this query returns the files where the word "x=4" is associated with the metaName "test=3" or that contains the word "y=5" not associated with any metaName.
Queries can be also constructed using any of the usual search features, moreover metaName and plain search can be mixed in a single query.
        metaName1 = (a1 or a4) not (a3 and a7)
This query will retrieve all the files in which "a1" or "a2" are found in the META tag "metaName1" and that do not contain the words "a3" and "a7", where "a3" and "a7" are not associated to any meta name.
Phrase Searching
To search for a phrase in a document use double-quotes to delimit your search terms.
        \"this is a phrase\" or (this and that)
You can not use boolean search terms inside a phrase. That is:
        this and that
finds documents with both words "this" and "that", but:
        \"this and that\"
finds documents that have the phrase "that and that". A phrase can consist of a single word, so this is how to search for the words used as boolean operators:
        this \"and\" that
finds documents that contain all three words, but in any order..
Thanks and regards,
Bharatwaj

From: users-bounces@lists.swish-e.org [mailto:users-bounces@lists.swish-e.org] On Behalf Of Franck Dupont
Sent: Monday, May 31, 2010 1:51 PM
To: users@lists.swish-e.org
Subject: [swish-e] Problem with unique word

Hello,

I've a list of files with towns :

1.txt => "Saint-Sulpice"
2.txt => "Saint"
3.txt => "Saint-Exupery"
4.txt => "Saint-Just"

My problem is that when I search the unique word "Saint", the file 2.txt (witch containts only this word) is not the first result.

The others files containing "Saint" arrived before 2.txt

How can I solve this problem ?

Thanks !

Franck.

**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are not 
to copy, disclose, or distribute this e-mail or its contents to any other person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has taken 
every reasonable precaution to minimize this risk, but is not liable for any damage 
you may sustain as a result of any virus in this e-mail. You should carry out your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***


_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Mon May 31 04:53:12 2010