Skip to main content.
home | support | download

Back to List Archive

Re: Using '*' wildcard in phrase searches...

From: Bill Moseley <moseley(at)>
Date: Mon Jan 05 2004 - 22:08:50 GMT
On Mon, Jan 05, 2004 at 01:03:12PM -0800, David Wood wrote:
> Sorry if this is a real RTFM'er...
> I know you can't use a '*' in the middle of a word, but can you use a '*' 
> at the end of a word in the middle of a phrase, like:
> "sneezle* weezle"
> ?
> I always thought this was a no-no, but it seems to work fine.  Could 
> somebody please confirm that?

Yes, that should work.  A phrase search is just an AND search with the
requirement that their position numbers are off by one (well, p1 =
p2-1).  If you search for the phrase "sneezle weezle" then some
"sneezle" has to proceed by position number some "weezle".  If you say
"sneez* weezle" it's the same thing.

> Also, just one small query parser thing I noticed (Bill M. is going to say 
> "hey, how about some help fixing the query parser?" here ;-); sorry, I wish 
> I could...):
> $ swish-e -w 'a*b' -f news.idx
> .. gives ...
> err: Wildcard not allowed within a word
> but
> $ swish-e -w '"a*b"' -f news.idx
> .. gives ...
> err: no results

Ya, that's a bug.  Should be reporting that as an error.

moseley@bumby:~$ swish-e -w '"a*b"' -H9
# Search words: "a*b"
# Parsed Words: " a* b "

It's adding a space after the "*" resulting in two search words.

> But a '*' in the middle of a phrase word is just as not allowed as a '*' in 
> the middle of a non-phrase word, right?

Right.  Swish-e only can do wildcards at the end of the word.  It
probably would not be that hard to implement, because you could say:


and then have swish-e find all foo* words with the existing code, and
then with new code have it check that each word ends with the given
ending.  But that feature does not currently exist.

You can currently do:


to search for the literal character "*", though.
But, the "*" is special in that it has to be escaped regardless if it's
in a phrase or not.  The other special operators (and boolean operators)
are disabled inside of a phrase.  Assuming all these characters are in
WordCharacters you can do:

   -w '"foo\*=()bar"'

to search for the string:  foo*=()bar

You also use quotes to disable boolean words:

moseley@bumby:~$ swish-e -w 'not' | grep err
err: No search words specified

moseley@bumby:~$ swish-e -w '"not"' | grep err
err: no results

> Version is 2.4.0 on HP-UX 11.0.
> Thanks,
> David
> *********************************************************************
> Due to deletion of content types excluded from this list by policy,
> this multipart message was reduced to a single part, and from there
> to a plain text message.
> *********************************************************************

Bill Moseley
Received on Mon Jan 5 22:08:59 2004