Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] grouping of search results by site?

From: Kevin Porter <kev(at)>
Date: Tue Feb 05 2008 - 14:34:03 GMT
OK thanks for the pointers Pete.

The problem really was predicting how many 'showable' results pages 
there would be, which wouldn't be possible unless you looped through the 
entire results (ie possibly hundreds of thousands). Even google doesn't 
seem to have a proper solution to this, they sometimes just 'guess' and 
show 10 'next page' links, but if you click '10' you end up at 7 becasue 
they haven't actually worked out if there really are 10 pages of results!

Anyway, I've made my own solution which I'm fairly happy with, doing the 
best I can to comproimise on the trade-offs. It goes through the search 
results, grouping as it goes, enough for the next 3 pages or so, and 
shows 'next page' links only for those pages that definitely exist: I've just done it PHP-side, and of course 
it's slow, but I will maybe re-code in C to speed it up when I get time.


- Kev

Peter Karman wrote:
> Kevin Porter wrote on 1/23/08 6:51 PM:
>> Hi,
>> Can anyone advise how best to implement grouping of search results, for 
>> example to show no more than 3 results from any one site in any one page 
>> of results?
> I would start by having a MetaName and PropertyName for each site. You could use 
> ExtractPath or some other feature to automate setting that value.
> Then it would depend on if (1) I wanted to only ever show 3 results from siteA 
> on only one page of results, or (2) if I could show 3 results from siteA on 
> every page.
> If #1, then it's simpler. Just grab 100 results at a time, skipping siteA hits 
> after I'd seen 3. Then remember the last valid result number, and pass that in 
> my paging URL syntax so I could pick up where I left off on the next page. In 
> Perl it would look something like this UNTESTED CODE:
>   my %seen;
>   my $offset;
>   my @hits;
>   BATCH: while (my $results = get_batch_of_results()) {
>     RESULT: while (my $result = $results->next_result) {
>       next if ++$seen{ $result->property('site_name') } > 3;
>       push(@hits, get_result_props( $result ) );
>       last BATCH if ++$offset == $max_hits_to_show;
>     }
>   }
>   # then pass $offset in paging URL
> If #2, it's a little trickier. Then if my results_per_page size were, say, 10, 
> then I would fetch 100 hits at a time, sorted by the site_name PropertyName, 
> then by swishrank. Then iterate over the 100 results, pulling out (in your 
> example) the first 3 from each site_name, until I had my 10 groups of 3-or-less 
> pages. If 100 results didn't give me enough variety to amass 10 total hits, then 
> I'd fetch another 100. To page results, I'd use the page_number as the offset 
> (page_number+3), so on page 2, I'd grab results 4-6 from each site_name, etc.
> In order to preserve some sense of the original ranking, I'd sum the swishrank 
> values from each group, and then resort by that sum before presenting results to 
> the user.
> This solution is probably naive; it's off the cuff. But it might give you some 
> sense of how someone else would implement the feature.

Kevin Porter
Advanced Web Construction Ltd

Users mailing list
Received on Tue Feb 5 09:34:16 2008