Skip to main content.
home | support | download

Back to List Archive

Re: possible problems with Swish-e searching instru

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Nov 12 2003 - 21:20:34 GMT
Ok, I took a few minutes to rewrite the search descriptions.  My doc
writing skills stink so I wanted to post here for review.  I'm sure someone
will find a few commas to remove.  My use of foo and bar are a bit geeky, I suppose.

I left in some parts that probably could be removed.  I think the
shorter the better.

Searching Syntax and Operations
    The "-w" command line argument is used specify the search query to
    Swish-e.

        swish-e -w airplane

    will find all documents that contain the word airplane.

    When running Swish-e from a shell prompt, be careful to protect your
    query from shell metacharacters and shell expansions. This often means
    placing single or double quotes around your query. See "Searching with
    Perl" if you plan to use Perl as a front end to Swish-e. In the examples
    below single quotes are used to protect the search from the shell.

    The following section describes various aspects of searching with
    Swish-e.

  Boolean Operators
    You can use the Boolean operators "and", "or" or "not" in searching. Without
    these Boolean operators Swish-e will assume you're and'ing the words
    together. The operators are not case sensitive. These three searches are
    the same:

        swish-e -w foo bar
        swish-e -w bar foo
        swish-e -w foo AND bar

    [Note: you can change the default to or'ing by changing the variable
    DEFAULT_RULE in the config.h file and recompiling Swish-e.]

    The not operator inverts the results of a search.

       swish-e -w not foo

    finds all the documents that do not contain the word foo.

    Parentheses can be used to group searches.

       swish-e -w 'not (foo and bar)'

    The result is all documents that have none or one term, but not both.

    To search for the words and, or, or not, place them in a double quotes.
    Remember to protect the quotes from the shell:

        swish-e -w '"not"'
        swish-e -w \"not\"

    will search for the word "not".

    Other examples:

         swish-e -w smilla or snow

    Retrieves files containing either the words "smilla" or "snow".

         swish-e -w smilla snow not sense
         swish-e -w '(smilla and snow) and not sense'  (same thing)

    retrieves first the files that contain both the words "smilla" and
    "snow"; then among those the ones that do not contain the word "sense".

  Truncation
    The wildcard (*) is available, however it can only be used at the end of
    a word: otherwise is is considered a normal character (i.e. can be
    searched for if included in the WordCharacters directive).

         swish-e -w librarian

    this query only retrieves files which contain the given word.

    On the other hand:

         swish-e -w 'librarian*'

    retrieves "librarians", "librarianship", etc. along with "librarian".

    Note that wildcard searches combined with word stemming can lead to
    unexpected results. If stemming is enabled, a search term with a
    wildcard will be stemmed internally before searching. So searching for
    "running*" will actually be a search for "run*", so "running*" would
    find "runway". Also, searching for "runn*" will not find "running" as
    you might expect, since "running" stems to "run" in the index, and thus
    "runn*" will not find "run".

  Order of Evaluation
    In general, the order of evaluation is not important. Internally swish-e
    processes the search terms from left to right. Parenthesis can be used
    to group searches together, effectively changing the order of
    evaluation. For example these three are the same:

        swish-e -w foo not bar baz
        swish-e -w not bar foo baz
        swish-e -w baz foo not bar

    but these two are not the same:

        swish-e -w foo not bar baz
        swish-e -w foo not (bar baz)

    The first finds all documents that contain both foo and baz, but do not
    contain bar. The second finds all that contain foo, and contain either
    bar or baz, but not both.

    It is often helpful in understanding searches to use the boolean terms and
    parenthesis. So the above two become:

        swish-e -w foo AND (not bar) AND baz
        swish-e -w foo AND (not (bar AND baz))

    These four examples are all the same search (assuming that AND is the
    default search type):

        swish-e -w 'juliet not ophelia and pac'
        swish-e -w '(juliet) AND (NOT ophelia) AND (pac)'
        swish-e -w 'juliet not ophelia pac'
        swish-e -w 'pac and juliet and not ophelia'

    Looking at the the first three searches, first Swish-e finds all the
    documents with "juliet". Then it finds all documents that do not contain
    "ophelia". Those two lists are then combined with the boolean AND
    operator resulting with a list of documents that include "juliet" but
    not "ophelia". Finally, that list is ANDed with the list of documents
    that contain "pac" resulting.

    However it is always possible to force the order of evaluation by using
    parenthesis. For example:

        swish-e -w 'juliet not (ophelia and pac)'

    retrieves files with "juliet" that do not contain both words "ophelia"
    and "pac".



-- 
Bill Moseley
moseley@hank.org
Received on Wed Nov 12 21:20:46 2003