Skip to main content.
home | support | download

Back to List Archive

Re: Combining stem/non stem removing dups in perl

From: Peter Karman <karman(at)not-real.cray.com>
Date: Thu Nov 04 2004 - 19:03:25 GMT
Brad Miele wrote on 11/04/2004 11:55 AM:
> ok,
> 
> And bear with me here, because I have a proclivity for denseness :)
> 

me too. :)


> so, if my $indexes variable looks something like:
> 
> "nonstemmed.index stemmed.index"
> 
> and I apply the keyword query "elevator",
> 
> my hits value is say 700, which is the total results of the combined
> indexes sorted by whatever method was supplied.
> 

since you want to separate (sort) the hits into stemmed and notstemmed, 
why not run 2 queries for 'elevator', one on each index, and return the 
range from each that you want? that way you don't have to cycle through 
all the nostemmed hits before you get to the stemmed ones.

# NO error checking here - that's not good
require SWISH::API;
die "need SWISH::API 0.03 or newer\n" if $SWISH::API::VERSION < 0.03;
my $q = 'elevator';
my $info = {};
my $stop = 20;  # how many hits from each index
INDEX: for my $index ( 'stemmed.index', 'nonstemmed.index' ) {
     my $swish = SWISH::API->new( $index );
     my $search = $swish->New_Search_Object( $q );
     $search->SetSort( 'id swishrank' );	# sort first by id
					# so that results align
     $search->Execute;

     my $cnt = 0;
     # SeekResult() here to get to where you want to start

     RESULT: while ( my $result = $_->NextResult ) {
	my @props = $result->PropertyList;
	PROP: for my $prop ( @props ) {
	    $info->{$index}->{$cnt}->{ $prop->Name } =
                 $result->Property( $prop->Name );
         }
         last RESULT if ++$cnt == $stop;
     }
}



then $info would look like:

{
   stemmed.index => {
     1 => {
         propertyname1 => 'value',
         propertyname2 => 'value2',
          },
     2 => {
  ..........
  },
   nostemmed.index => {
     1 => {
	.....

}

etc.

so you could then sort, rearrange to your heart's content.

for the next 'page' use the SeekResult method. the SWISH::API docs have 
a good example.

> and then, for the next 20 records, do the same thing, but start my looping
> at record x + 20. I just can't get my head around how to get swish to do
> the page stuff for me. now, what i ultimately want to do is push the
> stemmed stuff towards the end, and I think that I will use prog to set a
> sort like stemmed with a value of 1, but i still need to avoid the
> records.

again, use the SeekResult method.

	
> 
> sorry, my understanding of perl structures, and structure in general ;)


check out the great perl book, References and Objects by Schwartz. It's 
sort of a Learning More Perl. it transformed my understanding of the 
power of perl.


-- 
Peter Karman . http://www.cray.com/craydoc/ . karman(at)not-real.cray.com
"I love deadlines. I love the whooshing sound they make as they go by."
         - Douglas Adams
Received on Thu Nov 4 11:03:26 2004