Skip to main content.
home | support | download

Back to List Archive

Re: Applicability of Swish-E... Thoughts?

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Thu Jul 07 2005 - 12:11:18 GMT
Net Virtual Mailing Lists scribbled on 7/7/05 4:46 AM:


> 
> . Yet I can do things like "category=bc" and get a result....
> 
> 
> I originally tried doing:
> 
> <listing>
>   <id>278232</id>
>   <category>a</category>
>   <category>a.b</category>
>   <category>a.b.b</category>
>   <category>a.b.b.g</category>
>   <category>a.d</category>
>   <category>a.d.c</category>
>   <category>a.d.c.bc</category>
> </listing>
> 
> . but this didn't seem any better....  I feel as though I am missing
> something very basic here, might you know what it is?....
> 

you need to add a period as a valid WordCharacters -- the the *Characters config 
params.


> What I would really like is a way to say something like "swish-e -w UNIX'
> and have it return to me something like this:
> 
> a       15
> a.b     15
> a.b.b   5
> a.b.b.g 2
> a.b.b.h 3
> a.b     10
> a.b.g   10
> a.b.g.b 10
> 
> .. where the number to the right is the total count of matching records
> for each category.
> 
> Is what I am after here possible with Swish-E?  I know that I can feed
> the output of it into a script to generate this summary, but this is slow
> work...   I know nothing about Swish-E is architected at this point, but
> it almost seems like Swish-E would need to have everything it needs to
> internally generate this summary very quickly.

Swish-e is just a text indexer. It can keep track of text, and the context 
(MetaNames) in which the text is found, and can even store the text itself (as a 
Property). But it doesn't have any features for summarizing results like you're 
describing.

However, I can imagine some ways to still get what you want. If you knew all the 
possible categories you were interested in, you can use the API to perform a 
series of searches on an open index (or indexes) and still make it go pretty fast.

Example (in Perl) (UNTESTED!):

use SWISH::API;
my $swish = SWISH::API->new( 'index.swish-e' );
my $q = 'UNIX';
my @categories = qw( a a.b a.b.b a.b.b.g a.b.b.h a.b.g );
my %count;
for my $c (@categories)
{
     my $results = $swish->Query( "$q and category=$c" );
     $count{$c} = $results->Hits || 0;
}
# do something with the count
for my $c (@categories)
{
     print "$c    $count{$c}\n";
}

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
Received on Thu Jul 7 05:11:26 2005