Hi All!
I have the same problem, such disscution
http://www.swish-e.org/Discussion/archive/2004-10/8357.html.
I use Swish-e 2.4.3 on Linux and index files with -fs and DirTree.pl method.
PDF-Files are indexed correctly, but in search results <swishtitle> property
displays PDF-Filenames instead title of PDF-Files.
The same problem is with word and excel files. Though HTML-Files are
displayed with title.
My Config-File for indexing:
#############################################
# Swish-e config to index Intranet files #
#############################################
IndexDir /srv/www/htdocs/
FollowSymLinks yes
IndexName "Test"
IndexDescription "test index file"
IndexFile /home/swishe/swish-e/index/fs.index
FileFilter .pdf /home/swishe/swish-e/lib/swish-e/DirTree.pl
IndexContents HTML2 .htm .html .pdf
IndexContents TXT2 .doc .xls
IndexOnly .htm .html .pdf .txt .doc .xls
ReplaceRules replace "/srv/www/htdocs/" "http://127.0.0.1/"
PropertyNamesDate created_on
PropertyNames title author
Metanames swishtitle swishdocpath swishlastmodified
UndefinedMetaTags ignore
MetaNames automatic
StoreDescription TXT* 200000
StoreDescription HTML* <body> 200000
FileRules pathname contains '/0_'
FileRules filename contains '/0_'
IndexReport 3
ParserWarnLevel 1
#############################################
# end of file #
#############################################
The results of indexing:
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 95,365 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
95,365 unique words indexed.
8 properties sorted.
2,596 files indexed. 138,964,590 total bytes. 1,778,584 total words.
Elapsed time: 00:03:17 CPU time: 00:00:24
Indexing done!
The command "siwsh-filter-test "swish-filter-test -content ../test/foo.pdf"
gives:
"<meta name="title" content="PDF-File-Title">"
The command "swish-e -f /home/swishe/swish-e/index/fs.index -w
PDF-File-Title -x '<swishtitle>\n'" gives though:
"foo.pdf"
Have anyone an idea ???
Thanks in advance
Leonard Scheermann
Received on Wed Feb 16 08:28:10 2005