I think there must be good rank algorithms or another similar works done. Would it be a hard work to choose which algorithm
would be more suitable and implement it? ;-) I would be ready to work, but I'am not a 'code master'.
Would anyone else be interested??
>Subject: [SWISH-E] Re: Title matches on result top
> From: Bill Moseley <firstname.lastname@example.org>
> Date: Tue, 9 Mar 2004 05:46:14 -0800 (PST)
> To: Multiple recipients of list <email@example.com>
>On Tue, Mar 09, 2004 at 12:38:17AM -0800, firstname.lastname@example.org wrote:
>> >You could try tweaking those, but the other problem is that swish
>> >considers to some degree the number of hits in a file, so a large file
>> >may out-rank a smaller file with the word in the title.
>> Does not swish-e convert frequencys into percents?? Would it be a bad idea?
>You should look at rank.c. That and the query processing are
>long-standing problems that need attention. Ranking is very basic
>There's a mode to consider the length of the document in the rank
>calculations but when I tested the feature it didn't seem to make much
>difference in the ranking -- and in some cases made it worse.
>It's subjective, of course. What I did was index a few small (< 10,000
>pages) sites and then compare search results with google. I spent a day
>playing with small tweaks to rank.c and it was clear that very large
>files throw off the rank. One true hack was to limit the number of
>word hits per document and that one thing alone made the results match
>more like how google ranked. I just limited the frequency count to 100.
>How's that for an ugly hack?
>I had also tried limiting the counts to the first X word positions but
>with less of an effect. I was expecting that to have more of an effect.
>If you are looking for a document about something you might think that
>it would be discussed early on in the document.
>Swish-e has been used for indexing reasonably small sets of documents,
>so effective searching is often as helpful as is the ranking. Still, I
>hope someone comes along that knows something about ranking and has some
>time that can update swish-e's code.
Received on Tue Mar 9 07:20:35 2004