Skip to main content.
home | support | download

Back to List Archive

[swish-e] Differing results for two web servers with same config but different swish-e versions

From: Robinson Craig <Craig.Robinson(at)>
Date: Mon Mar 31 2008 - 06:46:54 GMT

I'm just trying to bring a new web server online, and I'm having some
trouble with the results returned after indexing.

The new webserver is a Solaris 10 machine (old web server is Solaris 8),
so we've had to recompile libxml2, all the xpdf utilities and swishe-e.
At the same time we took advantage of upgrading swish-e from 2.2.3 to
2.4.5. Everything indexes fine (same stats on both servers), but when I
do a search on the new index it "seems" as if the occurences of PDF's is
much higher on our new webserver as opposed to our old. That is, the
first PDF to show up in the search results on the old webserver might be
ranked 50th from the top ...  whereas, on the new web server, it might
be 6th from the top for a fairly broad search (for instance using the
term "water" - which we have LOTS of documents about).

The config files are the same, and the content is the same. I am
therefore presuming that it will have something to do with the differing
infrastructure versions. Here are some details:

"Old" Solaris 8 webserver:
SWISH-E 2.2.3
pdftotext version 3.01
libxml2 2.5.7

"New" Solaris 10 webserver:
SWISH-E 2.4.5
pdftotext version 3.02
libxml2 2.6.31

We do a "file-based" index, and filter our pdf's using:

FileFilter .pdf  "pdftotext" "-htmlmeta '%P' -"

It looks like I might have to go back to "first principles" and
reconfigure the indexing from scratch. But if anyone can give me any
clues as to why this might be happening, I'd be sure appreciative?

Alternatively, is there any way of putting more "weight" on HTML files
as opposed to PDF's. I'm mucking around with MetaNamesRank at the
moment, but I have to admit that it is pretty much "stabbing in the
Cheers, Craig

Craig Robinson
System Administrator , Web and Publishing Services

The information in this email together with any attachments is
intended only for the person or entity to which it is addressed
and may contain confidential and/or privileged material.
Any form of review, disclosure, modification, distribution
and/or publication of this email message is prohibited, unless
as a necessary part of Departmental business.
If you have received this message in error, you are asked to
inform the sender as quickly as possible and delete this message
and any copies of this message from your computer and/or your
computer system network.

Users mailing list
Received on Mon Mar 31 02:47:05 2008