Here you go. This is the *output*:
Indexing Data Source: "File-System"
Indexing "/archive/asheville"
Indexing "/archive/baltimore"
Indexing "/archive/birmingham"
Indexing "/archive/central"
Indexing "/archive/cincinnati"
Indexing "/archive/flint"
Indexing "/archive/greensboro"
Indexing "/archive/lasvegas"
Indexing "/archive/milwaukee"
Indexing "/archive/nashville"
Indexing "/archive/oklahoma"
Indexing "/archive/pittsburgh"
Indexing "/archive/portland"
Indexing "/archive/raleigh"
Indexing "/archive/rochester"
Indexing "/archive/tampa"
Indexing "/archive/wggb-springmass"
Indexing "/archive/buffalo"
Indexing "/archive/champaign"
Indexing "/archive/wics-springill"
Indexing "/archive/cedarrapids"
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 813,347 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: ...^M Writing word text: 10%^M Writing word
text: 20%^M Writing word text:
30%^M Writing word text: 40%^M Writing word text: 50%^M Writing
word text: 60%^M Writing wo
rd text: 70%^M Writing word text: 80%^M Writing word text: 90%^M
Writing word text: 100%^M Wr
iting word text: Complete
Writing word hash: ...^M Writing word hash: 10%^M Writing word
hash: 20%^M Writing word hash:
30%^M Writing word hash: 40%^M Writing word hash: 50%^M Writing
word hash: 60%^M Writing wo
rd hash: 70%^M Writing word hash: 80%^M Writing word hash: 90%^M
Writing word hash: 100%^M Wr
iting word hash: Complete
Writing word data: ...^M Writing word data: 9%^M Writing word
data: 19%^M Writing word data:
29%^M Writing word data: 39%^M Writing word data: 49%^M Writing
word data: 59%^M Writing wo
rd data: 69%^M Writing word data: 79%^M Writing word data: 89%^M
Writing word data: 99%^M Wr
iting word data: Complete
813,347 unique words indexed.
Sorting property: swishdocpath ^MSorting
property: swishtitle
^MSorting property:
swishdocsize ^MSorting property: sw
ishlastmodified ^MSorting property:
swishdescription ^M
Sorting property: year_month ^MSorting
property: market
^M7 properties sorted.
6,049,227 files indexed. 3,155,419,777 total bytes. 490,801,524 total
words.
Elapsed time: 09:37:42 CPU time: 06:14:06
Indexing done!
*Here's the indexing script:*
#!/bin/bash
# THIS WILL INDEX EVERYTHING, STARTING WITH NOTHING
cd /archive/setup
/bin/touch index.timestamp
/usr/local/bin/swish-e -e -c swish.conf -f avid.index.new 2>&1 >
index.report
/bin/mv -f avid.index.new avid.index
/bin/mv -f avid.index.new.prop avid.index.prop
/bin/rm -f avid.index.new
/bin/rm -f avid.index.new.prop
*Here's the reindexing script:*
#!/bin/bash
# THIS IS THE REINDEXING SCRIPT, WILL REINDEX ANYTHING WITH A TIMESTAMP
NEWER THAN index.timestamp
cd /archive/setup
/bin/touch index.timestamp.new
/usr/local/bin/swish-e -e -c swish.conf -N index.timestamp -f
avid.index.new 2>&1 > reindex.report
/usr/local/bin/swish-e -M avid.index avid.index.new avid.tmp 2>&1 >>
reindex.report
/bin/mv -f avid.tmp avid.index
/bin/mv -f avid.tmp.prop avid.index.prop
/bin/rm -f avid.index.new
/bin/rm -f avid.index.new.prop
/bin/cp -p index.timestamp.new index.timestamp
/bin/rm -f index.timestamp.new
*Here's the swish.conf*
# Swish-e Configuration File
# Let me know if there are problems
IndexAdmin mkralec@sbgnet.com
# Tell swish what to index
IndexDir /archive/asheville
IndexDir /archive/baltimore
IndexDir /archive/birmingham
IndexDir /archive/central
IndexDir /archive/cincinnati
IndexDir /archive/flint
IndexDir /archive/greensboro
IndexDir /archive/lasvegas
IndexDir /archive/milwaukee
IndexDir /archive/nashville
IndexDir /archive/oklahoma
IndexDir /archive/pittsburgh
IndexDir /archive/portland
IndexDir /archive/raleigh
IndexDir /archive/rochester
IndexDir /archive/tampa
IndexDir /archive/wggb-springmass
IndexDir /archive/buffalo
IndexDir /archive/champaign
IndexDir /archive/wics-springill
IndexDir /archive/cedarrapids
# Only index HTML and text files
IndexOnly .html
# Otherwise, use the HTML2 parser
IndexContents HTML2 .html
# Tell swish what to save the index as
IndexFile avid.index
# Don't index published.html
FileRules filename is published\.html
# Store the body as the description
StoreDescription HTML2 <body>
# Setup market meta name extraction
ExtractPath market regex !^/archive/([^/]+)/.*$!$1!
# Setup year_month meta name extraction
ExtractPath year_month regex !^/archive/[^/]+/([^/]+)/([^/]+)/.*$!$1$2!
# Let swish know about important fields
MetaNames date trt tapenumber
# Lets use the following for search sorting
PropertyNames year_month market
# Ignore Words found to be repetitive
IgnoreWords unknown tape text code archive time cues production date
news format
# Index words longer than 1 characters
MinWordLimit 2
Mike
Bill Moseley wrote:
>On Thu, Oct 28, 2004 at 04:31:46AM -0700, Mike Kralec wrote:
>
>
>>FYI, I just wanted to say that 2.5.2 compiled with the large file
>>support is working great for
>>me with a little over 6 million indexed files. I'm re-indexing nightly
>>and merging works fine
>>also. I'm up to around 2.5GB with the prop file now.
>>
>>
>
>Pushing the envelope, I see. Once again, thanks Jose!
>
>Can you post output from indexing to see number of files/words and
>indexing time.
>
>
>
>
Received on Fri Oct 29 04:48:02 2004