Skip to main content.
home | support | download

Back to List Archive

Re: Problem with Boolean NOT

From: <Jeffrey.Grunstein(at)not-real.ny.frb.org>
Date: Mon May 19 2003 - 15:57:53 GMT
In my search results, Swish-E returned documents that have the word BANK on
the page.

It's not the highlighting that's the issue.  It's the fact that Swish-E is
returning the
documents in the first place.

How can I go about diagnosing why this is happening?




|---------+---------------------------->
|         |           Bill Moseley     |
|         |           <moseley@hank.org|
|         |           >                |
|         |                            |
|         |           05/16/2003 05:12 |
|         |           PM               |
|         |                            |
|---------+---------------------------->
  >------------------------------------------------------------------------------------------------------------------------------|
  |                                                                                                                              |
  |       To:       Jeffrey.Grunstein@ny.frb.org                                                                                 |
  |       cc:       Multiple recipients of list <swish-e@sunsite.berkeley.edu>                                                   |
  |       Subject:  Re: [SWISH-E] Problem with Boolean NOT                                                                       |
  >------------------------------------------------------------------------------------------------------------------------------|




On Fri, May 16, 2003 at 12:12:55PM -0700, Jeffrey.Grunstein@ny.frb.org
wrote:
> We have Swish-E 2.2.1 running on a Solaris 9 box.
> I'm using a customized swish.cgi as the search interface.
> I'm using PhraseHighlightng a customized TemplateDefault.pm.
>
> When I try a search containing the boolean not, I still get results that
> contain the excluded word.
>
> A search like * strategic plan not bank *
> should return all results that contain * strategic *
> and * plan * but NOT the word * bank *.

moseley@bumby:~$ cat 1.txt
 strategic plan bank

moseley@bumby:~$ cat 2.txt
 strategic plan

moseley@bumby:~$ swish-e -i 1.txt 2.txt -v0

moseley@bumby:~$ swish-e -w  strategic plan not bank -H0
1000 2.txt "2.txt" 16

Swish works left to right, so
1) find all files with strategic  (1.txt 2.txt)
2) find all files with plan (1.txt 2.txt)
3) and those results (1.txt 2.txt)
4) find all files without bank (2.txt)
5) and 3 and 4 together to get 2.txt


> But I get
> results that contain the word * bank *.  For some
> of the results, the word bank is highlighted on the
> results page.

The highlighting doesn't look at "not" words.  I mean, there's no special
processing to
avoid highlighting a "not" word, IIRC.

It would probably be wise, but mostly the word isn't in the text due to the
"not" so I just
didn't worry about it.

--
Bill Moseley
moseley@hank.org
Received on Mon May 19 15:58:06 2003