Andrew Smith wrote on 3/27/09 4:12 PM:
>
> On a separate but related note, I'm actually considering trying to develop
> my own ranking scheme. I've been looking over the source code and it seems
> what I need to do is add a call to the new rank scheme function to getrank
> in rank.c, and then define the new ranking function (similar to getrankDEF
> and getrankIDF). Am I correct in this or missing any other key steps?
that is correct.
Are
> there any examples of other contributed ranking functions?
not that I know of.
Any high level
> overview of the code (or just read the comments)? Any other place (wiki,
> development list, etc.) where development related questions would be more
> appropriate than this list?
nope. this is The Place.
>> IDF/TF is a good start, but compared to the ranking algorithms in most high
>> scale systems these days, IDF/TF is very naive. And for purists, broken in
>> the
>> current Swish-e implementation when dealing with multiple indexes (for the
>> reason I state above).
>
>
> So you are saying that technically the current Swish-e is buggy when doing
> IDF for multiple index files (i.e. '-f indexfile1 indexfile2 ...')?
yes. To be True IDF/TF it should take the frequencies across all indexes. But in
the real world, given indexes of approximately random content, it likely doesn't
make much difference.
Also,
> for parallelism you would just divide up all the files to be indexed
> randomly and evenly among all the parallel processes, so each independent
> index file would be about the same size (and each index would have almost
> the same IDF statistics since you divided up files randomly). So in practice
> it shouldn't be a problem.
yes
>
>
>>
>> This is actually one of the main reasons I started Swish3, because I wanted
>> to
>> play with alternate ranking schemes and I saw that the 2.x architecture
>> wasn't
>> really suited to it. That, and UTF-8.
>
>
> Sounds nice, looking forward to seeing it. Any ETA on it?
>
see http://swish-e.org/archive/2009-01/12415.html
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Mar 27 19:44:52 2009