Peter Karman wrote:
>indexing as html will artificially inflate the number of occurances whenever a
>word matches in the <title>.
This does help, but not enough for some applications. A real problem with relevance-ranked searches of collections of judicial opinions is that it's hard to force title weight high enough to overcome large numbers of term-occurrences in the body text -- which is exactly what you get with important legal cases, because really important rulings are heavily cited. So the cases that repeatedly cite (eg.) Brown v. Board of Education inevitably rank higher, all the more maddening because the more important the case being sought by the user the more likely it is to be swamped by cases citing it. I guess other literatures manage to avoid this because citations don't give the title of the cited document in full as they do in judicial opinions.
Anyway, our cheap kludge for dealing with this is to run a title-only search separately and prepend those results to the hit list for full-text search. We tried jiggering the rankings as described in this thread and it helped, but not enough.
Thomas R. Bruce (email@example.com)
Director, Legal Information Institute
Cornell Law School
Received on Fri Feb 4 03:23:47 2005