Skip to main content.
home | support | download

Back to List Archive

Re: indexing PID exits -> swish++.index 0 bytes

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Tue Sep 06 2005 - 15:33:08 GMT
you're on the wrong list. This is Swish-e, not Swish++.

try over at http://homepage.mac.com/pauljlucas/software/swish/

oscaruser@programmer.net wrote on 09/06/2005 10:28 AM:
> Folks,
> SWISH++ 6.1.2. I am trying to index 66,542 files and used the following command:
> 
> $ cat myfiles-wget.log | httpindex -e'html:*' > idx-results.log
> 
> The contents of idx-results.log looks as follows. After awhile the process exits, output of idx-results.log abruptly halts and swish++.index is left at file size 0. It takes a long time to run, and is very annoying to see this result. My swish++.conf file is below as well. What is going wrongly??
> TIA
>  
> $ tail idx-results.log
> 
> createtopic.php?method=newtopic&forum=7&sid=200509011 (488 words)
> createtopic.php?method=newtopic&forum=7&sid=200509012 (488 words)
> createtopic.php?method=newtopic&forum=7&sid=200509013 (488 words)
> createtopic.php?method=newtopic&forum=7&sid=200509014 (488 words)
> createtopic.php?method=newtopic&forum=7&sid=200509015 (488 words)
> createtopic.php?method=newtopic&forum=7&sid=200509016 (488 words)
> createtopic.php?method=newtopic&forum=7&sid=200509017 (488 words)
> createtopic.php?method=newtopic&<EOF>
> 
> $ cat swish++.conf
> 
> Incremental		no
> #
> # used by: index; when "yes", same as the -I option.
> #
> #	When "yes", incrementally index files and add them to an existing
> #	index.
> 
> IndexFile		/home/www-data/public_html5/swish++.index
> #
> # used by: index, search; same as the -i option.
> #
> #	The name of the index file either generated or searched.
> 
> #LaunchdCooperation	no
> #
> # used by: search; same as the -l option
> #
> #	If "search" is run as a daemon, cooperate with Mac OS X's launchd(8) by
> #	not "daemonizing" itself since launchd handles that.  When "yes", this
> #	forces "SearchBackground no".
> #
> #	This option is available only under Mac OS X, should be used only for
> #	version 10.4 (Tiger) or later, and only when search will be started via
> #	launchd.
> 
> PidFile			/var/run/search.pid
> #
> # used by: search; same as the -P option
> #
> #	If "search" is run as a daemon, record its process ID in this file.
> 
> RecurseSubdirs	no	
> #
> # used by: index, extract; when "no", same as the -r option.
> #
> #	When "no", do not recursively index the files in subdirectories, that
> #	is when a directory is encountered, all the files in that directory are
> #	indexed (modulo the filename patterns specified via the IncludeFile,
> #	ExcludeFile, or ExtractFile variables), but subdirectories encountered
> #	are ignored and therefore the files contained in them are not indexed.
> #	(This variable is most useful when specifying the directories and files
> #	via standard input.)  The default is to index the files in
> #	subdirectories recursively.
> 
> ResultsMax		20
> #
> # used by: search; same as the -m option.
> #
> #	The maximum number of results to return overriding the compiled-in
> #	default (which is usually 100).
> 
> ResultSeparator	" "
> #
> # used by: search; same as the -R option
> #
> #	The string to separate the parts in a search result when ResultsFormat
> #	is "classic".  Either single or double quotes can be used to preserve
> #	whitespace.  Quotes are stripped only if they match.
> 
> ResultsFormat		classic
> #
> # used by: search; same as the -F option
> #
> #	The output format of search results: either "classic" or "XML".
> 
> SearchBackground	yes
> #
> # used by: search; when "no", same as the -B option.
> #
> #	When "yes" and SearchDaemon is not "none", automatically detach from
> #	the terminal and run in the background.
> #
> #	This option is overridden by "LaunchdCooperation yes".
> 
> SearchDaemon		none
> #
> # used by: search; same as the -b option.
> #
> #	When not "none", run "search" as a daemon process listening to either a
> #	Unix domain ("unix") or TCP socket ("tcp") or both ("both") for
> #	requests.
> 
> SocketAddress		*:1967
> #
> # used by: search; same as the -a option.
> #
> #	Default IP address and port of the TCP socket; used only when
> #	SearchDaemon  is either "tcp" or "both".
> 
> SocketFile		/home/www-data/public_html5/tmp/search.socket
> #
> # used by: search; same as the -u option.
> #
> #	Default name of the Unix domain socket file; used only when
> #	SearchDaemon is either "unix" or "both".
> 
> SocketQueueSize		511
> #
> # used by: search; same as the -q option.
> #
> #	Maximum number of queued connections for a socket; used only when
> #	SearchDaemon is not "none".  The default 511 value is taken from
> #	httpd.h in Apache:
> #
> #		It defaults to 511 instead of 512 because some systems store it
> #		as an 8-bit datatype; 512 truncated to 8-bits is 0, while 511
> #		is 255 when truncated.
> #
> #	If it's good enough for Apache, it's good enough for us.
> 
> SocketTimeout		10
> #
> # used by search; same as the -o option.
> #
> #	Number of seconds a client has to complete a search request before
> #	being disconnected.  This is to prevent a client from connecting, not
> #	completing a request, and causing the thread servicing the request to
> #	wait forever.  This is used only when SearchDaemon is not "none".
> 
> StemWords		no
> #
> # used by: search; when "yes", same as the -s option.
> #
> #	Perform stemming (suffix stripping) on words during searches.  Words
> #	that end in the wildcard character are not stemmed.
> 
> #StopWordFile		custom_stop_word_file
> #
> # used by: index, extract; same as the -s option.
> #
> #	The name of a file containing the set of stop-words to use instead of
> #	the built-in set.
> 
> StoreWordPositions	yes
> #
> # used by: index; when "no", same as the -P option.
> #
> #	Store word positions during indexing needed to do "near" searches.
> #	Storing said data approximately doubles the size of the generated
> #	index.
> 
> TempDirectory		/home/www-data/public_html5/tmp
> #
> # used by: index
> #
> #	Directory to use for temporary files during indexing.  If your OS
> #	mounts swap space on /tmp, as indexing progresses and more files get
> #	created in /tmp, you will have less swap space, indexing will get
> #	slower, and you may run out of memory.  If this is the case, you can
> #	specify a directory on a real filesystem, i.e., one on a physical
> #	disk.  The directory must exist.
> 
> ThreadsMin		5
> ThreadsMax		100
> #
> # used by: search; same as the -t or -T option, respectively.
> #
> #	The minimum/maximum number of simultanous threads, respectively; used
> #	only when SearchDaemon is not "none".
> 
> ThreadTimeout		30
> #
> # used by: search; same as the -O option.
> #
> #	Number of seconds until an idle spare thread times out and destroys
> #	itself; used only when SearchDaemon is not "none".
> 
> TitleLines		12
> #
> # used by: index; same as the -t option.
> #
> #	For HTML and XHTML files only, the maximum number of lines into a file
> #	to look at for HTML and XHTML <TITLE> tags.  The default is 12.  Larger
> #	numbers slow indexing.
> 
> Verbosity	4	
> #
> # used by: index, extract; same as the -v option.
> #
> #	Print additional information to standard output during indexing or
> #	extraction.  The verbosity levels are 0-4; see index(1) or extract(1)
> #	for details.
> 
> WordFilesMax		infinity
> #
> # used by: index; same as the -f option.
> #
> #	The maximum number of files a word may occur in before it is discarded
> #	as being too frequent.  The default is infinity.
> 
> WordPercentMax		101
> #
> # used by: index; same as the -p option.
> #
> #	The maximum percentage of files a word may occur in before it is
> #	discarded as being too frequent.  If you want to keep all words
> #	regardless, specify 101.
> 
> WordsNear		10
> #
> # used by: search; same as the -n option.
> #
> #	The maximum number of words apart two words can be to be considered
> #	"near" each other.
> 
> WordThreshold		250000
> #
> # used by: index; same as the -W option.
> #
> #	The word count past which partial indicies are generated and merged
> #	since all the words are too big to fit into memory at the same time.
> #	If you index and your machine begins to swap like mad, lower this
> #	value.  The above works OK in a 64MB machine.  A rule of thumb is to
> #	add 250000 words for each additional 64MB of RAM you have.  These
> #	numbers are for a SPARC machine running Solaris.  Other machines
> #	running other operating systems use memory differently.  You simply
> #	have to experiment.  Only the super-user can specify a value larger
> #	than the compiled-in default.
> 
> # the end
> 
> 

-- 
  Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Received on Tue Sep 6 08:33:12 2005