Re: Using Swish's Query Parser (to pre-filter a collection of documents)

From: Bill Moseley <moseley(at)>
Date: Thu Nov 18 2004 - 20:44:56 GMT
On Thu, Nov 18, 2004 at 12:35:12PM -0800, Masoud Pirnazar wrote:
> i think you have the correct picture, but here's another attempt:
> A:(a bunch of documents, say 500,000 docs)  |  B:(initial pre-filtering,
> qualifying say 40,000 docs)  | C:(index the 40,000 qualified docs) |
> D:(allow users to search the 40,000 qualified docs)
> (using the pipe sign | here to indicate the flow of data/different stages of
> processing)
> the end user specifies the criteria in steps B and D.  it would be easier
> for the end user to use the same query syntax in both steps.  at step B, it
> filters out a lot of unwanted documents.  at step D, they are searching
> using other criteria, so the query changes.
> a typical application:  fromthe 500,000 docs, i want to extract only the
> 40,000 docs that mention some kind of sport activity, then put those in the
> "sports collection" and allow end users to search the sports collection
> using whatever (unrelated) queries they want to use.

Nice old databases did this (BRS was one) where you do a query and you
get a set of records.  Then you can do queries on that set.

In swish you would index the entire thing and then do:

   -w some query AND type=sports

Bill Moseley

