Hello, Bill
I tried to install swish-e on win2k server, there is no problem indexing
html file, but I found swish-e can not indexing word document which has
white space in the file name, like "Wordfile.doc" & "copy of Wordfile.doc"
and also it cant index the directory with space, like "C:/MyTest/" &
"C:/My Test/"
I am using swish-e 2.2.1 & catdoc-0.93-win32 on Windows 2000
Thanks in advance
Jim
############################
#swish-e.config
IndexDir C:/MyTest/ C:/My Test/
IndexFile C:/wwwroot/cgi-bin/index/index.swish-e
#ReplaceRules remove "C:/wwwroot/"
IndexOnly .html .htm .pdf .doc
FileFilter .pdf c:/wwwroot/cgi-bin/xpdf/pdftotext.exe '"%p" -'
FileFilter .doc C:/wwwroot/cgi-bin/catdoc/catdoc.exe "%p"
#StoreDescription TXT 250000
StoreDescription HTML <body> 50000
#end of config
##########################
###########################
# run swish-e.bat
Indexing Data Source: "File-System"
Indexing "C:/MyTest/"
Checking dir "C:/MyTest/" ..
Copy of HtmlFile.hmt - Using DEFAULT <HTML2> parser - <132 words>
WordFile.doc - Using DEFAULT <HTML2> parser - < 551 words>
PdfFile.pdf - Using DEFAULT <HTML2> parser - < 1520 words>
HtmlFile.htm - Using DEFAULT <HTML2> parser - < 132 words>
Copy of WordFile.doc - - Using DEFAULT <HTML2> parser - catdoc:
No such file or directory
catdoc: No such file or directory
cardoc: No such file or directory
<no words indexed>
Copy of PdfFile.pdf - Using DEFAULT <HTML2> parser - < 1520
words>
Indexing "C:/My"
Warning: Invalid path 'C:/My' : No such file or directory
idexing "test/"
Warning: Invalid path 'Test' : No such file or directory
Removing very common words...
no words removed.
writing main index...
sort words...
sorting 626 words alphabetically
writing header...
writing index entries ...
writing word text: complet
writing word hash: complet
writing word data: complet
262 unique words indexed
5 properties sorted
6 files indexed. 309712 total bytes. 3855 total words.
elapsed time: 00:00:01 CPU time: 00:00:01
indexing done!
##end
######################################
*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Wed Oct 1 05:49:20 2003