We are currently using swish-e 2.4.3 to index our University webpages.
We use the supplied spider.pl to grab all the webpages and to filter pdf
files.
There are considerable amount of pdf files that get indexed. The search
results always returns the pdf files ranked higher then the webpages. As
I understand this could be due to the higher frequency of the word
appearing in the pdf files. I tried using the
IgnoreTotalWordCountWhenRanking directive in swishe config file. But
that didn't help much.
Another option was to use the MetaNamesRank to bias the ranks for meta
tags. The output generated by the pdf filter is always enclosed by <pre>
.. </pre> tags. So having something like
MetaNamesRank -3 pre
and including pre in the MetaNames list does the trick for now.
Is there any other way to do this that i missed? does anyone have a
better suggestion?
Would such a feature of biasing ranks based on file types be include in
future versions?
Thanks,
--
------------------------------------------------
Aliasgar Dahodwala
Application Integration Analyst
Information Systems Integration Team
Computing and Information Technology Services
University of Massachusetts Dartmouth
Phone : 508-910-6599
email : adahodwala [at] umassd [dot] edu
-------------------------------------------------
Received on Tue Jan 31 08:49:11 2006