Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] parallelism and Swish-e

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Fri Mar 27 2009 - 23:44:47 GMT
Andrew Smith wrote on 3/27/09 4:12 PM:

> 
> On a separate but related note, I'm actually considering trying to develop
> my own ranking scheme. I've been looking over the source code and it seems
> what I need to do is add a call to the new rank scheme function to getrank
> in rank.c, and then define the new ranking function (similar to getrankDEF
> and getrankIDF). Am I correct in this or missing any other key steps? 

that is correct.

Are
> there any examples of other contributed ranking functions?

not that I know of.

 Any high level
> overview of the code (or just read the comments)? Any other place (wiki,
> development list, etc.) where development related questions would be more
> appropriate than this list?

nope. this is The Place.


>> IDF/TF is a good start, but compared to the ranking algorithms in most high
>> scale systems these days, IDF/TF is very naive. And for purists, broken in
>> the
>> current Swish-e implementation when dealing with multiple indexes (for the
>> reason I state above).
> 
> 
> So you are saying that technically the current Swish-e is buggy when doing
> IDF for multiple index files (i.e. '-f indexfile1 indexfile2 ...')? 

yes. To be True IDF/TF it should take the frequencies across all indexes. But in
the real world, given indexes of approximately random content, it likely doesn't
make much difference.

Also,
> for parallelism you would just divide up all the files to be indexed
> randomly and evenly among all the parallel processes, so each independent
> index file would be about the same size (and each index would have almost
> the same IDF statistics since you divided up files randomly). So in practice
> it shouldn't be a problem.

yes


> 
> 
>>
>> This is actually one of the main reasons I started Swish3, because I wanted
>> to
>> play with alternate ranking schemes and I saw that the 2.x architecture
>> wasn't
>> really suited to it. That, and UTF-8.
> 
> 
> Sounds nice, looking forward to seeing it. Any ETA on it?
> 

see http://swish-e.org/archive/2009-01/12415.html


-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Mar 27 19:44:52 2009