Skip to main content.
home | support | download

Back to List Archive

Problem with searching things that apparently were indexed

From: H Vernon Leighton <vleighto(at)not-real.Princeton.EDU>
Date: Thu Sep 19 2002 - 09:03:21 GMT
I am asking this for some friends who are running a swish-e site. 

They are trying to control which pages on the site get indexed by using the
"obeyRobotsNoIndex yes" directive. It works in the sense that the pages
with the meta tag "noindex" do not get indexed. But some pages that do not
have that tag are not returned in the results even though they satisfy the
search. The pattern of what gets picked up and what does not is not
obvious.

When the obey directive is switched to "no", all pages that satisfy the
search (both with and without "noindex") are returned. They tried debugging
with -T PROPERTIES and -k [letter], and have confirmed that the fugitive
pages are being indexed, they are just not being returned by the search. 

They had been using a swish-e version that had been in development from
mid-July, so yesterday, it was upgraded to the new 2.2 version. The problem
persists. The parser is HTML2 using libxml2 2.4.22, and the operating
system is Solaris 2.8. Swish-e indexes via the file system, not via a
spider. 

Any help would be appreciated.

Vernon Leighton

EXAMPLES:

Command line to index the site:

swish-e -c swish_main.conf -f swish_main.index


A sample command that retrieves different non-robot directive pages
depending on the status of obeyRobotsNoIndex:

swish-e -w participation -f swish_test.index -p description -d ::



Portions of the swish-e configuration file:

IndexContents HTML2 .html .htm

DefaultContents HTML2

#INDEX ONLY FILES WITH THESE EXTENSIONS

IndexOnly .html .htm

obeyRobotsNoIndex yes
FollowSymLinks no

# TYPES OF DOCS NOT TO INDEX

NoContents .doc .gif .js .pdf .php .txt .xml

MetaNames subject

MetaNames description

###########################################
# Properties to be returned in the results
###########################################

StoreDescription HTML <description> 200

PropertyNameAlias swishdescription description
Received on Thu Sep 19 09:07:04 2002