Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] grouping of search results by site?

From: Peter Karman <peter(at)>
Date: Thu Jan 24 2008 - 02:54:08 GMT
Kevin Porter wrote on 1/23/08 6:51 PM:
> Hi,
> Can anyone advise how best to implement grouping of search results, for 
> example to show no more than 3 results from any one site in any one page 
> of results?

I would start by having a MetaName and PropertyName for each site. You could use 
ExtractPath or some other feature to automate setting that value.

Then it would depend on if (1) I wanted to only ever show 3 results from siteA 
on only one page of results, or (2) if I could show 3 results from siteA on 
every page.

If #1, then it's simpler. Just grab 100 results at a time, skipping siteA hits 
after I'd seen 3. Then remember the last valid result number, and pass that in 
my paging URL syntax so I could pick up where I left off on the next page. In 
Perl it would look something like this UNTESTED CODE:

  my %seen;
  my $offset;
  my @hits;
  BATCH: while (my $results = get_batch_of_results()) {
    RESULT: while (my $result = $results->next_result) {
      next if ++$seen{ $result->property('site_name') } > 3;
      push(@hits, get_result_props( $result ) );
      last BATCH if ++$offset == $max_hits_to_show;
  # then pass $offset in paging URL

If #2, it's a little trickier. Then if my results_per_page size were, say, 10, 
then I would fetch 100 hits at a time, sorted by the site_name PropertyName, 
then by swishrank. Then iterate over the 100 results, pulling out (in your 
example) the first 3 from each site_name, until I had my 10 groups of 3-or-less 
pages. If 100 results didn't give me enough variety to amass 10 total hits, then 
I'd fetch another 100. To page results, I'd use the page_number as the offset 
(page_number+3), so on page 2, I'd grab results 4-6 from each site_name, etc.

In order to preserve some sense of the original ranking, I'd sum the swishrank 
values from each group, and then resort by that sum before presenting results to 
the user.

This solution is probably naive; it's off the cuff. But it might give you some 
sense of how someone else would implement the feature.

Peter Karman  .  .  peter(at)
Users mailing list
Received on Wed Jan 23 21:54:10 2008