After following the download and install instructions from
http://sunsite.berkeley.edu/SWISH-E/Manual/downloadinstall.html,
I attempted to run the first indexing session using this cmd on a cobalt linux
5.0 machine:
/usr/local/bin/swish-e -c /home/sites/home/web/swish/sw.index.conf
and received these bad directive warnings.
Bad directives on lines
104 IgnoreLastChar
113 IgnoreFirstChar
165 MaxDepth 5
170 Delay 60
174 TmpDir /var/tmp
179 SpiderDirectory /opt/swish-e.x/src/
185 EquivalentServer http://ipaddressofwebservertobeindexed/
191
Nothing is listed next to 191.
/var/tmp is a valid directory
What am I doing wrong??
Thanks for the assistance.
Doug
Copy of the sw.index.conf file excluding ip address
# DIRECTIVES COMMON to HTTP and FILESYSTEM METHODS
###################################################
# WINDOWS USERS NOTE:
# Specify ALL files and directory paths in the
# the config file using the forward slash, as
# in this directory.
#
###################################################
IndexDir http://ipaddressofwebserver/
# For the FileSystem Method:
# This is a space-separated list of files and
# directories you want indexed. You can specify
# more than one of these directives.
#
# For the HTTP Method:
# Use the URL's from which you want the spidering
# to begin.
# NOTE: use hmtl files rather than directories
# for this method.
IndexFile /home/sites/home/web/spider
# This is what the generated index file will be.
IndexName "Site Index"
IndexDescription "Site Index Description"
IndexPointer "http://ipaddressofwebserver/"
IndexAdmin "Doug Dzierzak"
# Extra information you can include in the index file.
MetaNames first author
# List of all the meta names used in the file to index, must be on one line.
# If no metanames DO NOT delete the line.
IndexReport 3
# This is how detailed you want reporting. You can specify numbers
# 0 to 3 - 0 is totally silent, 3 is the most verbose.
FollowSymLinks yes
# Put "yes" to follow symbolic links in indexing, else "no".
UseStemming no
# Put yes to apply word stemming algorithm during indexing,
# else no. See the manual for info about stemming. Default is
# no.
PropertyNames author
# List of meta tags names that can be retrieved with the -p option.
# Index size increases as by the formula in the manual.
# Comment out if no PropertyNames. Case insensitive
IgnoreTotalWordCountWhenRanking yes
# Put yes to ignore the total number of words in the file
# when calculating ranking. Often better with merges and
# small files. Default is no.
#ReplaceRules remove "ghill/"
#ReplaceRules replace "[a-z_0-9]*_m.*\\.html" "index.html"
#ReplaceRules replace "/ghill" "moreghillmore"
# ReplaceRules allow you to make changes to file pathnames
# before they're indexed. This directive uses C library
# regex.h regular expressions.
# NOTE: do not use replace "" to remove a string,
# use remove instead - you might get a core dump otherwise.
MinWordLimit 5
# Set the minimum length of an indexable word. Every shorter word
# will not be indexed.
# Commenting out the line will give the defaults
MaxWordLimit 5
# Set the maximum length of an indexable word. Every longer word
# will not be indexed.
# Commenting out the line will give the defaults
WordCharacters abcdefghijklmnopqrstuvwxyz\\&#;0123456789.@|,-'"[](~!@$%^\{\}_+?
# WORDCHARS is a string of characters which SWISH permits to
# be in words. Any strings which do not include these characters
# will not be indexed. You can choose from any character in\par
# the following string:
#
# abcdefghijklmnopqrstuvwxyz0123456789_\\|/-+=?!@$%^'"`~,.[]\{\}()
#
# Note that if you omit "0123456789&#;" you will not be able to
# index HTML entities. DO NOT use the asterisk (*), lesser than
# and greater than signs (<), (>), or colon (:).
#
# Including any of these four characters may cause funny things to happen.
# NOTE: Do not escape \\ nor " and they cannot be the first letter in the string
# Commenting out the line will give the defaults
BeginCharacters m"
# Of the characters that you decide can go into words, this is
# a list of characters that words can begin with. It should be
# a subset of (or equal to) WordCharacters
# Same rule of syntax as for WordCharacters
EndCharacters \\"\\
# Of the characters that you decide can go into words, this is
# a list of characters that words can begin with. It should be
# a subset of (or equal to) WordCharacters
# Same rule of syntax as for WordCharacters
IgnoreLastChar
# Array that contains the char that, if considered valid in the middle of
# a word need to be disreguarded when at the end. It is important to also
# set the given char's in the ENDCHARS array, otherwise the word will not
# be indexed because considered invalid.
# Commenting out the line will give the defaults
# NOTE: if " is the first char in the string it needs to be escaped with \\
# Do not escape otherwise
IgnoreFirstChar
# Array that contains the char that, if considered valid in the middle of
# a word need to be disreguarded when at the beginning. This was to solve
# the problem of parenthesis when there is no space between ( and the
# beginning of the word.
# Remember to add the char's to the BEGINCHARS list also.
# Commenting out the line will give the defaults
# NOTE: if " is the first char in the string it needs to be escaped with \\
# Do not escape otherwise
IgnoreLimit 80 256
# This automatically omits words that appear too often in the files
# (these words are called stopwords). Specify a whole percentage
# and a number, such as "80 256". This omits words that occur in
# over 80% of the files and appear in over 256 files. Comment out
# to turn off auto-stopwording.
IgnoreWords SwishDefault
# The IgnoreWords option allows you to specify words to ignore.
# Comment out for no stopwords; the word "SwishDefault" will
# include a list of default stopwords. Words should be separated by spaces
# and may span multiple directives.
IndexComments 0
# This option allows the user decide if to index the comments in the files
# default is 1. Set to 0 if comment indexing is not required.
##################################
# DIRECTIVES for FILESYSTEMS ONLY
# Comment out if using HTTP
###################################
#IndexOnly .html .q
# Only files with these suffixes will be indexed.
#NoContents .gif .xbm .au .mov .mpg .pdf .ps
# Files with these suffixes will not have their contents indexed -
# only their file names will be indexed.
#FileRules pathname contains .*dir1
#FileRules filename contains # % ~ .bak .orig .old old.
#FileRules title contains construction example pointers
#FileRules directory contains .htaccess
#FileRules filename is index
# Files matching the above criteria will *not* be indexed.
# The patter matching uses the C library regex.h
################################
# DIRECTIVES for HTTP METHOD ONLY
# Comment out if using FILESYSTEM
##################################
MaxDepth 5
#(default 5) This defines how many links the spider should
#follow before stopping. A value of 0 configures the spider to
#traverse all links
Delay 60
#(default 60) The number of seconds to wait between issuing
#requests to a server.
TmpDir /var/tmp
#(default /var/tmp) The location of a writeable temp directory
#on your system. The HTTP access method tells the Perl helper to place
#its files there.
SpiderDirectory /opt/swish-e.x/src/
#(default ./) The location of the Perl helper
#script. Remember, if you use a relative directory, it is relative to
#your directory when you run SWISH-E, not to the directory that SWISH-E
#is in.
EquivalentServer http://publicipaddressofwebserver/
#(default nothing) This allows you to deal with
#servers that use respond to multiple DNS names. Each line should have
#a list of all the method/names that should be considered equivalent.
#If you have multiple directives, each one defines its own set of equivalent
#servers.
****copy of the config.h in the ../swish-e.x/src/
directory**********************
/*
** Copyright (C) 1995, 1996, 1997, 1998 Hewlett-Packard Company
** Originally by Kevin Hughes, kev@kevcom.com, 3/11/94
**
** This program and library is free software; you can redistribute it and/or
** modify it under the terms of the GNU (Library) General Public License
** as published by the Free Software Foundation; either version 2
** of the License, or any later version.
**
** This program is distributed in the hope that it will be useful,
** but WITHOUT ANY WARRANTY; without even the implied warranty of
** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
** GNU (Library) General Public License for more details.
**
** You should have received a copy of the GNU (Library) General Public License
** along with this program; if not, write to the Free Software
** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
**--------------------------------------------------------------------------
** Config file edited by Roy Tennant 2/20/96
** Config file edited by Giulia Hill 2/27/97 to increase lenght of
** words that are indexed
** Added IGNORELASTCHAR
** G. Hill 3/12/97 ghill@library.berkeley.edu
**
** Added OKNOMETA to allow no failing in case the META name is
** not listed in the config.h
** G. Hill 4/15/97 ghill@library.berkeley.edu
**
** Added IGNOREFIRSTCHAR
** G.Hill 10/16/97 ghill@library.berkeley.edu
**-----------------------------------------------------------------------
** The following are user-definable options that you can change
** to fine-tune SWISH's default options.
*/
#define ALLOW_HTTP_INDEXING_DATA_SOURCE 1
#define ALLOW_FILESYSTEM_INDEXING_DATA_SOURCE 1
/* These symbols allow compile-time elimination of indexing
** data sources. Any Data Source that is allowed by these
** symbols can be selected for indexing from the command line.
** Comment out any options you do not want to support, but
** be sure to leave at least one option.
*/
#define INDEXPERMS 0644
/* After SWISH generates an index file, it changes the permissions
** of the file to this mode. Change to the mode you like
** (note that it must be an octal number). If you don't want
** permissions to be changed for you, comment out this line.
*/
#define PLIMIT 80
#define FLIMIT 256
/* SWISH uses these parameters to automatically mark words as
** being too common while indexing. For instance, if I defined PLIMIT
** as 80 and FLIMIT as 256, SWISH would define a common word as
** a word that occurs in over 80% of all indexed files and over
** 256 files. Making these numbers lower will most likely make your
** index files smaller. Making PLIMIT and FLIMIT small will also
** ensure that searching consumes only so much CPU resources.
*/
#define VERBOSE 2
/* You can define VERBOSE to be a number from 0 to 4. 0 is totally
** silent operation; 4 is very verbose.
*/
#define MAXHITS 100
/* MAXHITS is the maximum number of results to return from a search.
*/
#define DEFAULT_RULE AND_RULE
/* If a list of search words is specified without booleans,
** SWISH will assume they are connected by a default rule.
** This can be AND_RULE or OR_RULE.
*/
define TITLETOPLINES 12
/* This is how many lines deep SWISH will look into an HTML file to
** attempt to find a <TITLE> tag.
*/
#define EMPHASIZECOMMENTS 0
/* Normally, words within HTML comments are not assigned a higher
** relevance rank. If you're including keywords in comments
** define this as 1 so matching results will rise to the top
** of search results.
*/
#define MINWORDLIMIT 1
/* This is the minimum length of a word. Anything shorter will not
** be indexed.
*/
#define MAXWORDLIMIT 40
/* This is the maximum length of a word. Anything longer will not
** be indexed.
*/
#define ASCIIENTITIES 1
/* If defined as 1, all entities in search words and indexed
** words will be converted to an ASCII equivalent. For instance,
** with this feature you can index the word "resumé" or
** "resumé" and it will be indexed as the word "resume".
** If defined as 0, only numerical entities will be converted
** to named entities, if they exist.
*/
#define IGNOREALLV 0
#define IGNOREALLC 0
#define IGNOREALLN 0
/* If IGNOREALLV is 1, words containing all vowels won't be indexed.
** If IGNOREALLC is 1, words containing all consonants won't be indexed.
** If IGNOREALLN is 1, words containing all digits won't be indexed.
** Define as 0 to allow words with consistent characters.
** Vowels are defined as "aeiou", digits are "0123456789".
*/
#define IGNOREROWV 60
#define IGNOREROWC 60
#define IGNOREROWN 60
/* IGNOREROWV is the maximum number of consecutive vowels a word can have.
** IGNOREROWC is the maximum number of consecutive consonants a word can have.
** IGNOREROWN is the maximum number of consecutive digits a word can have.
** Vowels are defined as "aeiou", digits are "0123456789".
*/
#define IGNORESAME 100
/* IGNORESAME is the maximum times a character can repeat in a word.
*/
#define WORDCHARS
"abcdefghijklmnopqrstuvwxyzÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØOESÙÚÛÜÝYÞàáâãäåæçèéêëìíîïðñò
óôõöøoesßùúûüýþÿ¸¡¿&0123456789"
/* WORDCHARS is a string of characters which SWISH permits to
** be in words. Any strings which do not include these characters
** will not be indexed. You can choose from any character in
** the following string:
**
** abcdefghijklmnopqrstuvwxyz0123456789_\|/-+=?!@$%^'\"`~,.[]{}()
**
** Note that if you omit "0123456789&#;" you will not be able to
** index HTML entities. DO NOT use the asterisk (*), lesser than
** and greater than signs (<), (>), or colon (:).
**
** Including any of these four characters may cause funny things to happen.
** If you have a pressing need to index 8-bit characters, please contact
** me for possible user testing in the future.
**
** Also note that if you specify the backslash character (\) or
** double quote (") you need to type a backslash before them to
** make the compiler understand them.
*/
#define BEGINCHARS
"abcdefghijklmnopqrstuvwxyzÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØOESÙÚÛÜÝYÞàáâãäåæçèéêëìíîïðñò
óôõöøoesßùúûüýþÿ¸¡¿&0123456789(\"'"
/* Of the characters that you decide can go into words, this is
** a list of characters that words can begin with. It should be
** a subset of (or equal to) WORDCHARS.
*/
#define ENDCHARS
"abcdefghijklmnopqrstuvwxyz\\ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØOESÙÚÛÜÝYÞàáâãäåæçèéêëìíîïð
ñòóôõöøoesßùúûüýþÿ¸¡¿.0123456789,'\")"
/* This is the same as BEGINCHARS, except you're testing for
** valid characters at the ends of words.
*/
/* Note that if you really want to edit the default stopwords, (words
** that are deemed too common to be indexed) then you can do so in the
** file "swish.h". They don't have to be in alphabetical order.
*/
#define IGNORELASTCHAR ";.,')\""
/* Array that contains the char that, if considered valid in the middle of
** a word need to be disreguarded when at the end. It is important to also
** set the given char's in the ENDCHARS array, otherwise the word will not
** be indexed because considered invalid.
** If none just leave the empty list "". Do not erase the line
*/
#define IGNOREFIRSTCHAR "('\""
/* Array that contains the char that, if considered valid in the middle of
** a word need to be disreguarded when at the beginning. This was to solve
** the problem of parenthesis when there is no space between ( and the
** beginning of the word.
** Remember to add the char's to the BEGINCHARS list also.
** If none just leave the empty list "". Do not erase the line
*/
#define OKNOMETA 1
/* Switch that define if it is ok to fail in case the META name is not listed
** in the METANAMES variable. Value of 1 will cause the word to be listed as a
** regular words with no metaName attached
*/
#define IGNORE_STOPWORDS_IN_QUERY 1
/* Added JM 1/10/98. Setting this to 0 (default) causes a stopword in
** an AND_RULE search to create an empty result. Setting it to 1 simply
** ignores the stopwords and does a search on the remaining words.
*/
#define INDEXTAGS 0
/* Normally, all data in tags in HTML files (except for words in
** comments or meta tags) is ignored. If you want to index HTML files with the
** text within tags and all, define this to be 1 and not 0.
** NOTE: if you set it to 1 you will not be able to do context nor
** metaNames searches, as tags are just plain text with no specific
** meaning.
*/
/* Set this variable to 1 if you are compiling under Win32
define _WIN32 0
*/
/* --- BEGIN PORTING-RELATED SYMBOLS --- */
#ifdef _WIN32
#define NO_SYMBOLIC_FILE_LINKS /* Win32 has no symbolic links */
#endif
#ifdef _WIN32
#undef INDEXPERMS /* Win32 version doesn't use chmod() */
#endif
#ifdef _WIN32
typedef int pid_t; /* process ID */
#endif
/* == compiler stuff == */
#define FUNCTION_PROTOTYPES_INCLUDE_ARGS /* comment out for non-ANSI
compilers */
/*
** Use the _AP() macro in all function definitions (in header files)
** to support both ANSI and non-ANSI compilers.
** Instead of:
** void somefunction();
** or:
** void somefunction(int, int, int);
** use:
** void somefunction _AP ((int, int, int));
*/
#ifdef FUNCTION_PROTOTYPES_INCLUDE_ARGS
#define _AP(args) args
#else
#define _AP(args) ()
#endif
#define SUPPORT_DOC_PROPERTIES 1
/* #define NEXTSTEP */
/* You may need to define this if compiling on a NeXTstep machine.
*/
/* --- END PORTING-RELATED SYMBOLS --- */
Received on Thu Nov 2 21:15:12 2000