Skip to main content.
home | support | download

Back to List Archive

Re: New to SWISH-E (help with numeric search term)

From: Peter Karman <karman(at)not-real.cray.com>
Date: Tue Jul 20 2004 - 03:56:09 GMT
See Bill's comments.

And, here's what I do with a similar situation.

We have publication numbers like S-1234-10 and 004-1234-10 and 
007-12340-10 and HW-1234-10. The main four digits '1234' are the ones 
that really matter, since the numbering scheme has changed over the 
years. I want a user to be able to see the book 'S-1234-10' whether she 
searches for 'S-1234' or '1234' or 'S-1234-10' -- she should get the 
same result. My experience has been that trying to educate users to use 
the wildcard * is a futile exercise.

One initial approach I took was to use a -S prog filter to add the 
different variations in a 'pubnum' metatag. That had certain drawbacks, 
though I can't remember at the moment what they were. Now, I have a 
little 'fuzzy' function in my search script that examines the user query 
and alters it to conform to how the docs are really indexed. So if a 
user enters 'S-1234' or 'S-1234-10' or '1234' -- all of those are 
manipulated before being handed to the actual swish-e search. If the 
user enters '1234' the manipulation looks something like:

S-1234* or 004-1234* or 007-1234* or 1234

This can, of course, be a little surprising for the user, if she wanted 
only docs that exactly matched '1234'. But typically, more results are 
better than too few results.

Stemming is likely NOT what you want, since from what I know of stemming 
algorithms, none of them would know what to do with your part numbers. 
Instead, you might want to add some kind of regexp to your search script 
that does:

my $before = 'xyz-2708';
my ($after) = ($before =~ m/-?(\d\d\d\d)/);

and then pass:

$before or $after

which would be in this case:

xyz-2708 or 2708

to your swish-e query. Of course, if your partnumbers are more 
complicated or varied than just 4 digits (which that regexp above 
roughly matches), then you'd have to get more clever.



MITCHELL TEIXEIRA wrote on 7/19/04 1:53 PM:

> Hello to the list - I am new to SWISH-E and need a little hint/help with
> making the search functionality better on my web site.  Customers on my site
> can order by part number which may contain a alpha/numeric prefix. If they
> search using the numeric portion of the part number only, then the correct
> results are displayed by SWISH-E, but if they add the alpha prefix, no hits
> are generated by SWISH-E.
> 
> Example:
>  part number 2708 can be referenced simply by "2708" or "XY-2708", "XY2708",
> "3P-2708", etc.  Searching with 2708 as the search term works, but searching
> with the prefix blows it.
> 
> We have too many alpha prefixes to try to index them all with the numeric
> portion.  I'd like a little help/hint on how to configure my SWISH-E or how
> better to improve my indexing.  Reviewing the docs, I suspect what I want
> has something to do with stemming, but I get the idea that may only work
> with words?
> 
> Many thanks - 
> MitchellT

-- 
Peter Karman - Software Publications Engineer - Cray Inc
phone: 651-605-9009 - mailto:karman@cray.com
Received on Mon Jul 19 20:56:37 2004