Skip to main content.
home | support | download

Back to List Archive

Config Troubles

From: fh hillsboro <linux(at)not-real.frankhunt.com>
Date: Wed Dec 07 2005 - 15:39:04 GMT
I am recovering from a nasty system crash (let's not go into that) 
wherein I lost some of my files - including all of my swish-e config 
stuff.  After rebuilding everything, I'm seeing a problem that I have 
seen before but for the life of me, I can't remember or figure out what 
I did to solve it.  I am prepared to be embarrassed by the solution (not 
a new experience for me).

The problem is that I can't index my entire site.  Indexing works on 
some of the directories but not others.  Here's a partial list of some 
of my document root directories and files:

drwxr-xr-x   16 root     root         4096 Dec  5 14:06 FRANK
drwxr-xr-x   13 root     root         4096 Dec  4 15:45 Peggy-Sue
drwxr-xr-x   25 root     root         8192 Dec  4 15:45 SFCC
drwxr-xr-x   17 root     sys          4096 Jul 25  2002 TSC
drwxrwxrwx    9 root     sys       16384 Dec  7 00:08 Weather
drwxr-xr-x   26 root     sys          4096 Dec  5 15:19 emily
-rwxr--r--       1 root     fhunt       17784 Dec  6 22:43 index.html

I can index FRANK, Peggy-Sue, Weather and emily.
I cannot index SFCC, TSC or index.html

robots.txt works to block indexing (disallow: /FRANK/ works, etc) but it 
doesn't matter what the entry for SFCC, TSC and index.html are:
This:
User-agent: *
Disallow: /FRANK/
Disallow: /SFCC/
Disallow: /TSC/
Disallow: /Peggy-Sue/
Disallow: /Weather/
Disallow: /index.html

works the same as this:
User-agent: *
Disallow: /FRANK/
##Disallow: /SFCC/
##Disallow: /TSC/
Disallow: /Peggy-Sue/
Disallow: /Weather/
##Disallow: /index.html

I'm running version 2.4.3 on RH 9 (2.4.20-31.9)

Here's the run string: /usr/local/bin/swish-e -S prog -c 
/web/httpd/bin/swish_index/swish.conf

Here's the config file:
IndexDir spider.pl
IndexFile /web/httpd/bin/swish_index/index.swish-e
SwishProgParameters default http://www.frankhunt.com/
Metanames swishtitle swishdocpath
StoreDescription HTML* <body> 10000
IndexReport 3

I have checked the robots.txt file with an on-line syntax checker and it 
is good.  I can index other sites.  I'm going crazy here.

Ideas?

-- 
Frank Hunt
Confused Linux Admin
General Nuisance
Web Weasel
Received on Wed Dec 7 07:39:05 2005