Thanks to all Swish-E listers who've replied to my post asking for help with
What I need to happen with my indexes and searches seems simple to me, but
what the results I want aren't coming yet. As Bill suggested, a SQL search
may fit better. So, how can I make similar results happen with SWISH-E?
For example, if I have the part number "2708" in my indexed document,
shouldn't a search term of "27" produce a "hit" in my search results without
having to add the wildcard character? If I have the word "bedding" in my
indexed document, shouldn't a search request for "bed" find "bedding"? Its
not happening now, how can I make that happen? I guess in SQL it would be
SELECT * FROM SEARCHDB WHERE WORD LIKE '%2708%' or SELECT * FROM SEARCHDB
WHERE WORD LIKE '%bed%'.
I am using a third-party shopping cart which uses Swish-E as its underlying
search engine. About all I have access to modify is the indexing
configuration file. I do not have access to the Java servlet which calls the
Swish-E search engine and processes the search results.
Many thanks for any ideas, suggestions, etc.
From: Peter Karman [mailto:email@example.com]
Sent: Monday, July 19, 2004 11:54 PM
To: Multiple recipients of list
Subject: [SWISH-E] Re: New to SWISH-E (help with numeric search term)
See Bill's comments.
And, here's what I do with a similar situation.
We have publication numbers like S-1234-10 and 004-1234-10 and
007-12340-10 and HW-1234-10. The main four digits '1234' are the ones
that really matter, since the numbering scheme has changed over the
years. I want a user to be able to see the book 'S-1234-10' whether she
searches for 'S-1234' or '1234' or 'S-1234-10' -- she should get the
same result. My experience has been that trying to educate users to use
the wildcard * is a futile exercise.
One initial approach I took was to use a -S prog filter to add the
different variations in a 'pubnum' metatag. That had certain drawbacks,
though I can't remember at the moment what they were. Now, I have a
little 'fuzzy' function in my search script that examines the user query
and alters it to conform to how the docs are really indexed. So if a
user enters 'S-1234' or 'S-1234-10' or '1234' -- all of those are
manipulated before being handed to the actual swish-e search. If the
user enters '1234' the manipulation looks something like:
S-1234* or 004-1234* or 007-1234* or 1234
This can, of course, be a little surprising for the user, if she wanted
only docs that exactly matched '1234'. But typically, more results are
better than too few results.
Stemming is likely NOT what you want, since from what I know of stemming
algorithms, none of them would know what to do with your part numbers.
Instead, you might want to add some kind of regexp to your search script
my $before = 'xyz-2708';
my ($after) = ($before =~ m/-?(\d\d\d\d)/);
and then pass:
$before or $after
which would be in this case:
xyz-2708 or 2708
to your swish-e query. Of course, if your partnumbers are more
complicated or varied than just 4 digits (which that regexp above
roughly matches), then you'd have to get more clever.
MITCHELL TEIXEIRA wrote on 7/19/04 1:53 PM:
> Hello to the list - I am new to SWISH-E and need a little hint/help with
> making the search functionality better on my web site. Customers on my
> can order by part number which may contain a alpha/numeric prefix. If they
> search using the numeric portion of the part number only, then the correct
> results are displayed by SWISH-E, but if they add the alpha prefix, no
> are generated by SWISH-E.
> part number 2708 can be referenced simply by "2708" or "XY-2708",
> "3P-2708", etc. Searching with 2708 as the search term works, but
> with the prefix blows it.
> We have too many alpha prefixes to try to index them all with the numeric
> portion. I'd like a little help/hint on how to configure my SWISH-E or
> better to improve my indexing. Reviewing the docs, I suspect what I want
> has something to do with stemming, but I get the idea that may only work
> with words?
> Many thanks -
Peter Karman - Software Publications Engineer - Cray Inc
phone: 651-605-9009 - mailto:firstname.lastname@example.org
Received on Tue Jul 20 15:09:10 2004