Folks,
SWISH++ 6.1.2. I am trying to index 66,542 files and used the following command:
$ cat myfiles-wget.log | httpindex -e'html:*' > idx-results.log
The contents of idx-results.log looks as follows. After awhile the process exits, output of idx-results.log abruptly halts and swish++.index is left at file size 0. It takes a long time to run, and is very annoying to see this result. My swish++.conf file is below as well. What is going wrongly??
TIA
$ tail idx-results.log
createtopic.php?method=newtopic&forum=7&sid=200509011 (488 words)
createtopic.php?method=newtopic&forum=7&sid=200509012 (488 words)
createtopic.php?method=newtopic&forum=7&sid=200509013 (488 words)
createtopic.php?method=newtopic&forum=7&sid=200509014 (488 words)
createtopic.php?method=newtopic&forum=7&sid=200509015 (488 words)
createtopic.php?method=newtopic&forum=7&sid=200509016 (488 words)
createtopic.php?method=newtopic&forum=7&sid=200509017 (488 words)
createtopic.php?method=newtopic&<EOF>
$ cat swish++.conf
Incremental no
#
# used by: index; when "yes", same as the -I option.
#
# When "yes", incrementally index files and add them to an existing
# index.
IndexFile /home/www-data/public_html5/swish++.index
#
# used by: index, search; same as the -i option.
#
# The name of the index file either generated or searched.
#LaunchdCooperation no
#
# used by: search; same as the -l option
#
# If "search" is run as a daemon, cooperate with Mac OS X's launchd(8) by
# not "daemonizing" itself since launchd handles that. When "yes", this
# forces "SearchBackground no".
#
# This option is available only under Mac OS X, should be used only for
# version 10.4 (Tiger) or later, and only when search will be started via
# launchd.
PidFile /var/run/search.pid
#
# used by: search; same as the -P option
#
# If "search" is run as a daemon, record its process ID in this file.
RecurseSubdirs no
#
# used by: index, extract; when "no", same as the -r option.
#
# When "no", do not recursively index the files in subdirectories, that
# is when a directory is encountered, all the files in that directory are
# indexed (modulo the filename patterns specified via the IncludeFile,
# ExcludeFile, or ExtractFile variables), but subdirectories encountered
# are ignored and therefore the files contained in them are not indexed.
# (This variable is most useful when specifying the directories and files
# via standard input.) The default is to index the files in
# subdirectories recursively.
ResultsMax 20
#
# used by: search; same as the -m option.
#
# The maximum number of results to return overriding the compiled-in
# default (which is usually 100).
ResultSeparator " "
#
# used by: search; same as the -R option
#
# The string to separate the parts in a search result when ResultsFormat
# is "classic". Either single or double quotes can be used to preserve
# whitespace. Quotes are stripped only if they match.
ResultsFormat classic
#
# used by: search; same as the -F option
#
# The output format of search results: either "classic" or "XML".
SearchBackground yes
#
# used by: search; when "no", same as the -B option.
#
# When "yes" and SearchDaemon is not "none", automatically detach from
# the terminal and run in the background.
#
# This option is overridden by "LaunchdCooperation yes".
SearchDaemon none
#
# used by: search; same as the -b option.
#
# When not "none", run "search" as a daemon process listening to either a
# Unix domain ("unix") or TCP socket ("tcp") or both ("both") for
# requests.
SocketAddress *:1967
#
# used by: search; same as the -a option.
#
# Default IP address and port of the TCP socket; used only when
# SearchDaemon is either "tcp" or "both".
SocketFile /home/www-data/public_html5/tmp/search.socket
#
# used by: search; same as the -u option.
#
# Default name of the Unix domain socket file; used only when
# SearchDaemon is either "unix" or "both".
SocketQueueSize 511
#
# used by: search; same as the -q option.
#
# Maximum number of queued connections for a socket; used only when
# SearchDaemon is not "none". The default 511 value is taken from
# httpd.h in Apache:
#
# It defaults to 511 instead of 512 because some systems store it
# as an 8-bit datatype; 512 truncated to 8-bits is 0, while 511
# is 255 when truncated.
#
# If it's good enough for Apache, it's good enough for us.
SocketTimeout 10
#
# used by search; same as the -o option.
#
# Number of seconds a client has to complete a search request before
# being disconnected. This is to prevent a client from connecting, not
# completing a request, and causing the thread servicing the request to
# wait forever. This is used only when SearchDaemon is not "none".
StemWords no
#
# used by: search; when "yes", same as the -s option.
#
# Perform stemming (suffix stripping) on words during searches. Words
# that end in the wildcard character are not stemmed.
#StopWordFile custom_stop_word_file
#
# used by: index, extract; same as the -s option.
#
# The name of a file containing the set of stop-words to use instead of
# the built-in set.
StoreWordPositions yes
#
# used by: index; when "no", same as the -P option.
#
# Store word positions during indexing needed to do "near" searches.
# Storing said data approximately doubles the size of the generated
# index.
TempDirectory /home/www-data/public_html5/tmp
#
# used by: index
#
# Directory to use for temporary files during indexing. If your OS
# mounts swap space on /tmp, as indexing progresses and more files get
# created in /tmp, you will have less swap space, indexing will get
# slower, and you may run out of memory. If this is the case, you can
# specify a directory on a real filesystem, i.e., one on a physical
# disk. The directory must exist.
ThreadsMin 5
ThreadsMax 100
#
# used by: search; same as the -t or -T option, respectively.
#
# The minimum/maximum number of simultanous threads, respectively; used
# only when SearchDaemon is not "none".
ThreadTimeout 30
#
# used by: search; same as the -O option.
#
# Number of seconds until an idle spare thread times out and destroys
# itself; used only when SearchDaemon is not "none".
TitleLines 12
#
# used by: index; same as the -t option.
#
# For HTML and XHTML files only, the maximum number of lines into a file
# to look at for HTML and XHTML <TITLE> tags. The default is 12. Larger
# numbers slow indexing.
Verbosity 4
#
# used by: index, extract; same as the -v option.
#
# Print additional information to standard output during indexing or
# extraction. The verbosity levels are 0-4; see index(1) or extract(1)
# for details.
WordFilesMax infinity
#
# used by: index; same as the -f option.
#
# The maximum number of files a word may occur in before it is discarded
# as being too frequent. The default is infinity.
WordPercentMax 101
#
# used by: index; same as the -p option.
#
# The maximum percentage of files a word may occur in before it is
# discarded as being too frequent. If you want to keep all words
# regardless, specify 101.
WordsNear 10
#
# used by: search; same as the -n option.
#
# The maximum number of words apart two words can be to be considered
# "near" each other.
WordThreshold 250000
#
# used by: index; same as the -W option.
#
# The word count past which partial indicies are generated and merged
# since all the words are too big to fit into memory at the same time.
# If you index and your machine begins to swap like mad, lower this
# value. The above works OK in a 64MB machine. A rule of thumb is to
# add 250000 words for each additional 64MB of RAM you have. These
# numbers are for a SPARC machine running Solaris. Other machines
# running other operating systems use memory differently. You simply
# have to experiment. Only the super-user can specify a value larger
# than the compiled-in default.
# the end
--
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm
Received on Tue Sep 6 08:29:15 2005