Skip to main content.
home | support | download

Back to List Archive

What is Swish-e?

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Sun Apr 14 2002 - 23:07:04 GMT
Ok, I was just looking over the README file that is included with 2.1-dev.
The first section of the README is titled "What is Swish-e?"

I'd like to have a clear description of swish, something that helps people
decide if swish is the right program to use.  Nothing I dislike more than
finding some software and after reading the introduction still not having a
clear idea what it really does.  

Swish is not always the best choice, and I think it would be helpful to
explain this as best as we can, too.

If someone asked me, I'd say that swish-e is more of a tool than an
application.  Besides being very fast at indexing and searching, I think
its strong points are the ability to control the input and output to a fine
degree.  In other words, it's very customizable, and can be integrated into
an existing web site design.

For example, the soon-to-be-released new mod_perl site contains quite a bit
of documentation.  The docs are quite long pages so the search results were
not that useful.  So adding two small chunks of code the *config* file for
the spider we were able to make it split the pages into sections and index
them separately.  So now search results point to the specific section of a
page, not just a page.

Swish-e's big weaknesses are lack of incremental indexing (often made up by
indexing speed), limit of eight bit characters, and lack of a turn-key
setup (hence "tool" instead of "application").

Anyway, over the last couple of months or so a few people have written me
saying that they had selected swish after reviewing a number of options.
It might be helpful to others to describe what factors made swish stand
out.  Or not.

I made a few minor changes in the README, so this is different from what's
currently on-line.

What is Swish-e?
       Swish-e is Simple Web Indexing System for Humans -
       Enhanced.  Swish-e can quickly and easily index
       directories of files or remote web sites and search the
       generated indexes.

       Swish-e it extremely fast in both indexing and searching,
       highly configurable, and can be seamlessly integrated with
       existing web sites to maintain a consistent design.
       Swish-e can index web pages, but can just as easily index
       text files, mailing list archives, or data sorted in a
       relational database.

       Swish-e is an Open Source program supported by developers
       and a large group of users.  Please take time to join the
       Swish-e discussion list at http://Swish-e.org.

Basically, what I'm asking is what *basic* information would have been
helpful to you when you first were learning about (or deciding to use)
swish.  Unfortunately, I doubt there's many people on the list that decided
not to use swish, as that would be very good feedback, too.

I said *basic* because I'm not talking about a better description of some
config directive or how to implement the search script.  Just basic,
initial info.

There's also a "Features" section of the README doc.  I've included it
below.  Please let me know if there's anything missing or you feel is
incorrect or misleading.

Thanks!

I hope people appreciate my use of "easily"...

  Key features

    *   Quickly index a large number of documents in different formats
        including text, HTML, and XML.

    *   Use "filters" to index other types of files such as PDF, gzip, or
        Postscript.

    *   Includes a web spider for indexing remote documents over HTTP.
        Follows Robots Exclusion Rules (including <META> tags).

    *   Use an external program to supply documents to Swish-e, such as an
        advanced spider for your web server, or a program to read and format
        records from a relational database management system (RDBMS).

    *   Document "properties" (some subset of the source document, usually
        defined as a META or XML elements) may be stored in the index and
        returned with search results

    *   Document summaries can be returned with each search

    *   Word stemming and soundex indexing

    *   Phrase searching and wildcard searching

    *   Limit searches to HTML links

    *   Use powerful Regular Expressions to select documents for indexing

    *   Easily limit searches to parts or all of your web site

    *   Results can be sorted by relevance or by any number of properties in
        ascending or descending order

    *   Limit searches to parts of documents such as certain HTML tags
        (META, TITLE, comments, etc.) or to XML elements.

    *   Can report structural errors in your XML and HTML documents

    *   Includes example search scripts

    *   Swish-e is fast.

    *   It's open source and FREE! You can customize Swish-e and you can
        contribute your fancy new features to the project.

    *   Supported by on-line user and developer groups

-- 
Bill Moseley
mailto:moseley@hank.org
Received on Sun Apr 14 23:08:31 2002