On Wed, 29 Jan 2003, Andrew Smith wrote:
> I've been looking into this some more and it is due to extract_query_match
> not working correctly. In looking at extract_query_match, it assumes all
> metanames are given explicitly; but Swish-e doesn't require that they all
> be given (if no metaname is attached to a search term, it is assumed to be
> for swishdefault; e.g. "DNA sequence" == "swishdefault=(DNA and
> sequence)"). Thus, queries where just search terms are given, or where
> they are interspersed with other metanames, are not parsed correctly.
Yes, but if I remember correctly swish.cgi always uses a metaname, even if
it's swishdefault. It's suppose to build up the query like:
meta=( query words ) othermeta=( other words )
> I went ahead and wrote a new version of extract_query_match which handles
> cases like above. Basically, it also looks for "metaname = ..." chunks,
> but for anything left that wasn't in such a chunk the search terms in them
> are attached to swishdefault. I wrote it as a simple recursive-descent
> parser. It solves the above cases, and seems fine on other cases too,
> although I haven't done extensive tests with it. If people think it
> useful, I'd like to donate it to the Swish-e community so it could be
> gotten from the Swish-e website. How should I go about doing this?
That would be great. That regular expression breaks if the query is not
strictly formatted. I've often though about having swish spit out the
query in chunks based on metaname. But it gets tricky since queries can
be nested. A recursiver parser is the right way to go, I think.
> Also, I wanted to build my own CGI script (with my own forms, navigation
> features, result format, etc.) for searching with Swish-e, but I still
> wanted to make use of the highlighting modules that come with the Swish-e
> distribution. In studying the highlighting modules that come with Swish-e,
> they seemed tightly integrated with the data structures inside swish.cgi,
> and it seemed like it could be difficult to use them independently of
> swish.cgi (which is what I wanted to do). In particular, I wanted to use
> the PhraseHighlight.pm module, and I created a version of it with a
> simpler interface that could be used independently of swish.cgi.
Great idea. You are right that the interface is not very clean. It gets
passed in the results object instead of just the data it needs. That's
being lazy on my part. (it might be better to base the interface on
things available in the SWISH::API module, but that's another day...).
The phrase highlighting is really slow, too. I spent some time trying to
make it faster (asked Perlmonks, too), but it's just slow splitting all
that text and stepping though the text. Simple regular expression
matching is much faster but less accurate. I have often wondered if
writing it in C would improve speed much.
> Basically, in my modified version you create the highlight object by
> passing in a hash of the result header names and values (i.e., "Parsed
> Words" => ..., etc.); and then you highlight text by calling "highlight",
> passing in a reference to the text to highlight and the metaname whose
> terms should be highlighted in that text. The code which actually does
> the highlighting is the same, however. I didn't create similar versions
> of SimpleHighlight.pm and DefaultHighlight.pm, but they could easily be
> modified similarly. Anyway, I'd like to donate this code also if it would
> be useful to the Swish-e community.
You want CVS access?
Bill Moseley email@example.com
Received on Thu Jan 30 02:11:39 2003