Thomas den Braber wrote on 10/12/2009 02:18 PM:
>> my $facets = $results->facets_for('color')->sort_by_count;
> That is the one I am looking for.
> Can you say something about the extra performance/memory this facets
> search cost ?
> Especially if there are many facet values (> 10000) ?
Everything costs something. You can minimize the cost for the facet
collection if you can reduce the number of total loops and move the
evaluating code closer to the compiled language.
The FacetFinder in swish_xapian is in C++ so it is about as fast as it
can be. You could use that as a benchmark when comparing the equivalent
code in a language binding like perl.
You have to look at every match, or a representative sample. The xapian
MatchDecider is optimized so that as it is doing the result set
comparision (running the search), it also collects facets. So there is
only one loop.
With Swish 2.x IIRC you would have to either run 2 searches, one with no
limit to get the facets and another with a limit to see just the page of
results you want, or, run 1 search and manage the paging yourself in
your code. Either way, you have to do the facet collection *after* the
search has been performed, so you effectively have two loops.
With Swish3 with Xapian it'll happen *while* the search is being
performed, so should be somewhat faster.
The overhead will be more pronounced for big facet values if your
calling code is in the native binding language rather than in C++ (the
xapian core language). There's a lot of overhead spent crossing
boundaries between the compiled library and the binding language (perl,
> If there are many facet values and I only need the top 10, are all facet
> values still loaded into the $facets array ?
Bear in mind I haven't written the code yet ;) , so we could have
$facets = $results->facets_for('color')->sort_by_count_limit(10);
But all the facets will be there in $results, and limiting the number
returned is mostly a convenience, since the expensive part is in
building the list to start with, not slicing it once you've got it. You
don't know what the top 10 are till you have looked at a big enough sample.
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Users mailing list
Received on Mon Oct 12 15:41:11 2009