Skip to main content.
home | support | download

Back to List Archive

Extract Path and searching

From: Weir James K Contr ASC/ENOI <James.Weir(at)>
Date: Fri Apr 09 2004 - 16:56:11 GMT
Hello all,

I am running swish-e from a windows 2000 server with 2 966MHZ processors with 2.0 GB of ram and 90 GB of drive space 
I have setup and indexed a huge number of text files.
here is the config file for the indexing 

	# Site Index Config File
	# The index file to create
	IndexFile d:/webdata/catalog/temp/sndxap006.index
	# The folder at which to start indexing (/ = webroot in this example)
	IndexDir //somecomptername/Legacy/BROWSE/TEXT
	MetaNames DeptSym
	ExtractPath DeptSym regex !^.*/TEXT/([^/]+)/.*$!$1!
	# Index comments?
	#IndexComments no
	FollowSymLinks yes
	# Files to index
	IndexOnly .txt
	# File Contents
	IndexContents TXT* .txt
	# Filtering files are kept in this folder
	FilterDir D:/webdata/catalog/lib/swish-e
	#Stop Word List
	IgnoreWords File: D:/webdata/catalog/stopwords/english.txt

	# Store the first 200 characters in the index
	StoreDescription TXT* 500
	# Exclude any folder with the following in the path:
	#FileRules dirname contains /_
	#FileRules dirname contains /catalog
	# Fuzzy indexing 
	FuzzyIndexingMode Metaphone
	# Now, specify which meta name to include in the index.
	MetaNames swishdocpath
	#MetaNames keywords title
	#MetaNameAlias keywords author description	
	# By default, undefined meta names are indexed as plain text
	# This feature can change this behaviour. Here we say
	# don't index text in metatags unless defined in MetaNames
	#UndefinedMetaTags ignore

Here is the command line that I use to index the file 
	d:\webdata\catalog\swish-search.exe -c d:\webdata\catalog\sndxap006.config -e -v1 -T REGEX > d:\webdata\catalog\outsndxap006.log

Information from the log file that is created

      Original String: '//somecomputername/Legacy/BROWSE/TEXT/EN/1/1/EN/document title/somefilename_0458.TXT'
	replace //somecomputername/Legacy/BROWSE/TEXT/EN/1/1/EN/document title/somefilename_0458.TXT =~ m[^.*/TEXT/([^/]+)/.*$][$1]: Matched
  	Result String: 'EN'
When I set the search to search for documents in the DeptSym 'EN'
I receive back documents from DeptSym 'EN' and 'ENS'
Here is the search command I am using 
	%ComSpec% /c D:\WebData\catalog\swish-search.exe -b 0 -m 25 -d:: -p swishdescription -f D:\WebData\catalog\AP006Indexes	\sndxap006.index  -w foobar and DeptSym=(EN) -T properties > D:\WebData\catalog\1081512252031.txt 

Received on Fri Apr 9 09:56:13 2004