Skip to main content.
home | support | download

Back to List Archive

Re: New to SWISH-E (help with numeric search term)

From: MITCHELL TEIXEIRA <MITCHELL.TEIXEIRA(at)not-real.firstSTREETonline.com>
Date: Tue Jul 20 2004 - 22:08:48 GMT
Thanks to all Swish-E listers who've replied to my post asking for help with
SWISH-E.  

What I need to happen with my indexes and searches seems simple to me, but
what the results I want aren't coming yet. As Bill suggested, a SQL search
may fit better.  So, how can I make similar results happen with SWISH-E?

For example, if I have the part number "2708" in my indexed document,
shouldn't a search term of "27" produce a "hit" in my search results without
having to add the wildcard character?  If I have the word "bedding" in my
indexed document, shouldn't a search request for "bed" find "bedding"? Its
not happening now, how can I make that happen?  I guess in SQL it would be
SELECT * FROM SEARCHDB WHERE WORD LIKE '%2708%' or SELECT * FROM SEARCHDB
WHERE WORD LIKE '%bed%'.

I am using a third-party shopping cart which uses Swish-E as its underlying
search engine. About all I have access to modify is the indexing
configuration file. I do not have access to the Java servlet which calls the
Swish-E search engine and processes the search results. 

Many thanks for any ideas, suggestions, etc.
Regards,
MitchellT

-----Original Message-----
From: Peter Karman [mailto:karman@cray.com]
Sent: Monday, July 19, 2004 11:54 PM
To: Multiple recipients of list
Subject: [SWISH-E] Re: New to SWISH-E (help with numeric search term)


See Bill's comments.

And, here's what I do with a similar situation.

We have publication numbers like S-1234-10 and 004-1234-10 and 
007-12340-10 and HW-1234-10. The main four digits '1234' are the ones 
that really matter, since the numbering scheme has changed over the 
years. I want a user to be able to see the book 'S-1234-10' whether she 
searches for 'S-1234' or '1234' or 'S-1234-10' -- she should get the 
same result. My experience has been that trying to educate users to use 
the wildcard * is a futile exercise.

One initial approach I took was to use a -S prog filter to add the 
different variations in a 'pubnum' metatag. That had certain drawbacks, 
though I can't remember at the moment what they were. Now, I have a 
little 'fuzzy' function in my search script that examines the user query 
and alters it to conform to how the docs are really indexed. So if a 
user enters 'S-1234' or 'S-1234-10' or '1234' -- all of those are 
manipulated before being handed to the actual swish-e search. If the 
user enters '1234' the manipulation looks something like:

S-1234* or 004-1234* or 007-1234* or 1234

This can, of course, be a little surprising for the user, if she wanted 
only docs that exactly matched '1234'. But typically, more results are 
better than too few results.

Stemming is likely NOT what you want, since from what I know of stemming 
algorithms, none of them would know what to do with your part numbers. 
Instead, you might want to add some kind of regexp to your search script 
that does:

my $before = 'xyz-2708';
my ($after) = ($before =~ m/-?(\d\d\d\d)/);

and then pass:

$before or $after

which would be in this case:

xyz-2708 or 2708

to your swish-e query. Of course, if your partnumbers are more 
complicated or varied than just 4 digits (which that regexp above 
roughly matches), then you'd have to get more clever.



MITCHELL TEIXEIRA wrote on 7/19/04 1:53 PM:

> Hello to the list - I am new to SWISH-E and need a little hint/help with
> making the search functionality better on my web site.  Customers on my
site
> can order by part number which may contain a alpha/numeric prefix. If they
> search using the numeric portion of the part number only, then the correct
> results are displayed by SWISH-E, but if they add the alpha prefix, no
hits
> are generated by SWISH-E.
> 
> Example:
>  part number 2708 can be referenced simply by "2708" or "XY-2708",
"XY2708",
> "3P-2708", etc.  Searching with 2708 as the search term works, but
searching
> with the prefix blows it.
> 
> We have too many alpha prefixes to try to index them all with the numeric
> portion.  I'd like a little help/hint on how to configure my SWISH-E or
how
> better to improve my indexing.  Reviewing the docs, I suspect what I want
> has something to do with stemming, but I get the idea that may only work
> with words?
> 
> Many thanks - 
> MitchellT

-- 
Peter Karman - Software Publications Engineer - Cray Inc
phone: 651-605-9009 - mailto:karman@cray.com
Received on Tue Jul 20 15:09:10 2004