To continue:
Here's on Windows 98:
E:\Program Files\SWISH-E>cat f
IndexContents HTML2 .htm .html .shtml
StoreDescription HTML2 <body> 200
IndexDir http://community.e-baptisthealth.com/tools/health-and-wellness/retrieve?id=/hic/default
MaxDepth 1
Delay 0
DefaultContents HTML2
IgnoreMetaTags style script
That last line:
IgnoreMetaTags style script
makes the HTML2 parser skip the <script> section.
E:\Program Files\SWISH-E>swish-e -c f -S http
Indexing Data Source: "HTTP-Crawler"
Indexing "http://community.e-baptisthealth.com/tools/health-and-wellness/retrieve?id=/hic/default"
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 864 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
864 unique words indexed.
5 properties sorted.
1 file indexed. 76664 total bytes. 1653 total words.
Elapsed time: 00:00:05 CPU time: 00:00:05
Indexing done!
E:\Program Files\SWISH-E>swish-e -w you -x "Title=%t\n\nSummary\n<swishdescription>\n"
# SWISH format: 2.1-dev-25
# Search words: you
# Number of hits: 1
# Search time: 0.000 seconds
# Run time: 0.110 seconds
Title=Baptist Health - Health & Wellness
Summary
Condition Super Centers Diseases, Conditions and Injuries These comprehensive centers provide detailed information on some of the most common health conditions.
This quick-reference guide offers the b
.
Hum, I thought it knew to truncate on white space instead of in the middle of a word...
BTW - you had this in your config:
FileFilter .pdf prog-bin/pdf2html
My guess is that won't work on Windows like that. See the docs for more info.
Hope this helps.
Hey David: What's the deal with this? Is the percent sign a shell meta in Windows?
Linux:
> ./swish-e -w you -x '%t\n%t\n' -H 0
Baptist Health - Health & Wellness
Baptist Health - Health & Wellness
Windows:
E:\Program Files\SWISH-E>swish-e -w you -x "%t\n" -H0
Baptist Health - Health & Wellness
E:\Program Files\SWISH-E>swish-e -w you -x "%t\n%t" -H0
t
E:\Program Files\SWISH-E>swish-e -w you -x "%%t\n%%t" -H0
Baptist Health - Health & Wellness
Baptist Health - Health & Wellness
--
Bill Moseley
mailto:moseley@hank.org
Received on Thu Jan 17 03:18:56 2002