Skip to main content.
home | support | download

Back to List Archive

indexing PID exits -> swish++.index 0 bytes

From: <oscaruser(at)not-real.programmer.net>
Date: Tue Sep 06 2005 - 15:29:11 GMT
Folks,
SWISH++ 6.1.2. I am trying to index 66,542 files and used the following command:

$ cat myfiles-wget.log | httpindex -e'html:*' > idx-results.log

The contents of idx-results.log looks as follows. After awhile the process exits, output of idx-results.log abruptly halts and swish++.index is left at file size 0. It takes a long time to run, and is very annoying to see this result. My swish++.conf file is below as well. What is going wrongly??
TIA
 
$ tail idx-results.log

createtopic.php?method=newtopic&forum=7&sid=200509011 (488 words)
createtopic.php?method=newtopic&forum=7&sid=200509012 (488 words)
createtopic.php?method=newtopic&forum=7&sid=200509013 (488 words)
createtopic.php?method=newtopic&forum=7&sid=200509014 (488 words)
createtopic.php?method=newtopic&forum=7&sid=200509015 (488 words)
createtopic.php?method=newtopic&forum=7&sid=200509016 (488 words)
createtopic.php?method=newtopic&forum=7&sid=200509017 (488 words)
createtopic.php?method=newtopic&<EOF>

$ cat swish++.conf

Incremental		no
#
# used by: index; when "yes", same as the -I option.
#
#	When "yes", incrementally index files and add them to an existing
#	index.

IndexFile		/home/www-data/public_html5/swish++.index
#
# used by: index, search; same as the -i option.
#
#	The name of the index file either generated or searched.

#LaunchdCooperation	no
#
# used by: search; same as the -l option
#
#	If "search" is run as a daemon, cooperate with Mac OS X's launchd(8) by
#	not "daemonizing" itself since launchd handles that.  When "yes", this
#	forces "SearchBackground no".
#
#	This option is available only under Mac OS X, should be used only for
#	version 10.4 (Tiger) or later, and only when search will be started via
#	launchd.

PidFile			/var/run/search.pid
#
# used by: search; same as the -P option
#
#	If "search" is run as a daemon, record its process ID in this file.

RecurseSubdirs	no	
#
# used by: index, extract; when "no", same as the -r option.
#
#	When "no", do not recursively index the files in subdirectories, that
#	is when a directory is encountered, all the files in that directory are
#	indexed (modulo the filename patterns specified via the IncludeFile,
#	ExcludeFile, or ExtractFile variables), but subdirectories encountered
#	are ignored and therefore the files contained in them are not indexed.
#	(This variable is most useful when specifying the directories and files
#	via standard input.)  The default is to index the files in
#	subdirectories recursively.

ResultsMax		20
#
# used by: search; same as the -m option.
#
#	The maximum number of results to return overriding the compiled-in
#	default (which is usually 100).

ResultSeparator	" "
#
# used by: search; same as the -R option
#
#	The string to separate the parts in a search result when ResultsFormat
#	is "classic".  Either single or double quotes can be used to preserve
#	whitespace.  Quotes are stripped only if they match.

ResultsFormat		classic
#
# used by: search; same as the -F option
#
#	The output format of search results: either "classic" or "XML".

SearchBackground	yes
#
# used by: search; when "no", same as the -B option.
#
#	When "yes" and SearchDaemon is not "none", automatically detach from
#	the terminal and run in the background.
#
#	This option is overridden by "LaunchdCooperation yes".

SearchDaemon		none
#
# used by: search; same as the -b option.
#
#	When not "none", run "search" as a daemon process listening to either a
#	Unix domain ("unix") or TCP socket ("tcp") or both ("both") for
#	requests.

SocketAddress		*:1967
#
# used by: search; same as the -a option.
#
#	Default IP address and port of the TCP socket; used only when
#	SearchDaemon  is either "tcp" or "both".

SocketFile		/home/www-data/public_html5/tmp/search.socket
#
# used by: search; same as the -u option.
#
#	Default name of the Unix domain socket file; used only when
#	SearchDaemon is either "unix" or "both".

SocketQueueSize		511
#
# used by: search; same as the -q option.
#
#	Maximum number of queued connections for a socket; used only when
#	SearchDaemon is not "none".  The default 511 value is taken from
#	httpd.h in Apache:
#
#		It defaults to 511 instead of 512 because some systems store it
#		as an 8-bit datatype; 512 truncated to 8-bits is 0, while 511
#		is 255 when truncated.
#
#	If it's good enough for Apache, it's good enough for us.

SocketTimeout		10
#
# used by search; same as the -o option.
#
#	Number of seconds a client has to complete a search request before
#	being disconnected.  This is to prevent a client from connecting, not
#	completing a request, and causing the thread servicing the request to
#	wait forever.  This is used only when SearchDaemon is not "none".

StemWords		no
#
# used by: search; when "yes", same as the -s option.
#
#	Perform stemming (suffix stripping) on words during searches.  Words
#	that end in the wildcard character are not stemmed.

#StopWordFile		custom_stop_word_file
#
# used by: index, extract; same as the -s option.
#
#	The name of a file containing the set of stop-words to use instead of
#	the built-in set.

StoreWordPositions	yes
#
# used by: index; when "no", same as the -P option.
#
#	Store word positions during indexing needed to do "near" searches.
#	Storing said data approximately doubles the size of the generated
#	index.

TempDirectory		/home/www-data/public_html5/tmp
#
# used by: index
#
#	Directory to use for temporary files during indexing.  If your OS
#	mounts swap space on /tmp, as indexing progresses and more files get
#	created in /tmp, you will have less swap space, indexing will get
#	slower, and you may run out of memory.  If this is the case, you can
#	specify a directory on a real filesystem, i.e., one on a physical
#	disk.  The directory must exist.

ThreadsMin		5
ThreadsMax		100
#
# used by: search; same as the -t or -T option, respectively.
#
#	The minimum/maximum number of simultanous threads, respectively; used
#	only when SearchDaemon is not "none".

ThreadTimeout		30
#
# used by: search; same as the -O option.
#
#	Number of seconds until an idle spare thread times out and destroys
#	itself; used only when SearchDaemon is not "none".

TitleLines		12
#
# used by: index; same as the -t option.
#
#	For HTML and XHTML files only, the maximum number of lines into a file
#	to look at for HTML and XHTML <TITLE> tags.  The default is 12.  Larger
#	numbers slow indexing.

Verbosity	4	
#
# used by: index, extract; same as the -v option.
#
#	Print additional information to standard output during indexing or
#	extraction.  The verbosity levels are 0-4; see index(1) or extract(1)
#	for details.

WordFilesMax		infinity
#
# used by: index; same as the -f option.
#
#	The maximum number of files a word may occur in before it is discarded
#	as being too frequent.  The default is infinity.

WordPercentMax		101
#
# used by: index; same as the -p option.
#
#	The maximum percentage of files a word may occur in before it is
#	discarded as being too frequent.  If you want to keep all words
#	regardless, specify 101.

WordsNear		10
#
# used by: search; same as the -n option.
#
#	The maximum number of words apart two words can be to be considered
#	"near" each other.

WordThreshold		250000
#
# used by: index; same as the -W option.
#
#	The word count past which partial indicies are generated and merged
#	since all the words are too big to fit into memory at the same time.
#	If you index and your machine begins to swap like mad, lower this
#	value.  The above works OK in a 64MB machine.  A rule of thumb is to
#	add 250000 words for each additional 64MB of RAM you have.  These
#	numbers are for a SPARC machine running Solaris.  Other machines
#	running other operating systems use memory differently.  You simply
#	have to experiment.  Only the super-user can specify a value larger
#	than the compiled-in default.

# the end


-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm
Received on Tue Sep 6 08:29:15 2005