Ok, I took a few minutes to rewrite the search descriptions. My doc
writing skills stink so I wanted to post here for review. I'm sure someone
will find a few commas to remove. My use of foo and bar are a bit geeky, I suppose.
I left in some parts that probably could be removed. I think the
shorter the better.
Searching Syntax and Operations
The "-w" command line argument is used specify the search query to
Swish-e.
swish-e -w airplane
will find all documents that contain the word airplane.
When running Swish-e from a shell prompt, be careful to protect your
query from shell metacharacters and shell expansions. This often means
placing single or double quotes around your query. See "Searching with
Perl" if you plan to use Perl as a front end to Swish-e. In the examples
below single quotes are used to protect the search from the shell.
The following section describes various aspects of searching with
Swish-e.
Boolean Operators
You can use the Boolean operators "and", "or" or "not" in searching. Without
these Boolean operators Swish-e will assume you're and'ing the words
together. The operators are not case sensitive. These three searches are
the same:
swish-e -w foo bar
swish-e -w bar foo
swish-e -w foo AND bar
[Note: you can change the default to or'ing by changing the variable
DEFAULT_RULE in the config.h file and recompiling Swish-e.]
The not operator inverts the results of a search.
swish-e -w not foo
finds all the documents that do not contain the word foo.
Parentheses can be used to group searches.
swish-e -w 'not (foo and bar)'
The result is all documents that have none or one term, but not both.
To search for the words and, or, or not, place them in a double quotes.
Remember to protect the quotes from the shell:
swish-e -w '"not"'
swish-e -w \"not\"
will search for the word "not".
Other examples:
swish-e -w smilla or snow
Retrieves files containing either the words "smilla" or "snow".
swish-e -w smilla snow not sense
swish-e -w '(smilla and snow) and not sense' (same thing)
retrieves first the files that contain both the words "smilla" and
"snow"; then among those the ones that do not contain the word "sense".
Truncation
The wildcard (*) is available, however it can only be used at the end of
a word: otherwise is is considered a normal character (i.e. can be
searched for if included in the WordCharacters directive).
swish-e -w librarian
this query only retrieves files which contain the given word.
On the other hand:
swish-e -w 'librarian*'
retrieves "librarians", "librarianship", etc. along with "librarian".
Note that wildcard searches combined with word stemming can lead to
unexpected results. If stemming is enabled, a search term with a
wildcard will be stemmed internally before searching. So searching for
"running*" will actually be a search for "run*", so "running*" would
find "runway". Also, searching for "runn*" will not find "running" as
you might expect, since "running" stems to "run" in the index, and thus
"runn*" will not find "run".
Order of Evaluation
In general, the order of evaluation is not important. Internally swish-e
processes the search terms from left to right. Parenthesis can be used
to group searches together, effectively changing the order of
evaluation. For example these three are the same:
swish-e -w foo not bar baz
swish-e -w not bar foo baz
swish-e -w baz foo not bar
but these two are not the same:
swish-e -w foo not bar baz
swish-e -w foo not (bar baz)
The first finds all documents that contain both foo and baz, but do not
contain bar. The second finds all that contain foo, and contain either
bar or baz, but not both.
It is often helpful in understanding searches to use the boolean terms and
parenthesis. So the above two become:
swish-e -w foo AND (not bar) AND baz
swish-e -w foo AND (not (bar AND baz))
These four examples are all the same search (assuming that AND is the
default search type):
swish-e -w 'juliet not ophelia and pac'
swish-e -w '(juliet) AND (NOT ophelia) AND (pac)'
swish-e -w 'juliet not ophelia pac'
swish-e -w 'pac and juliet and not ophelia'
Looking at the the first three searches, first Swish-e finds all the
documents with "juliet". Then it finds all documents that do not contain
"ophelia". Those two lists are then combined with the boolean AND
operator resulting with a list of documents that include "juliet" but
not "ophelia". Finally, that list is ANDed with the list of documents
that contain "pac" resulting.
However it is always possible to force the order of evaluation by using
parenthesis. For example:
swish-e -w 'juliet not (ophelia and pac)'
retrieves files with "juliet" that do not contain both words "ophelia"
and "pac".
--
Bill Moseley
moseley@hank.org
Received on Wed Nov 12 21:20:46 2003