Skip to main content.
home | support | download

Back to List Archive

PDF-Files title in search results as Filename ???

From: Scheermann Leonard <Leonard.Scheermann(at)not-real.DLE-M.Bayern.de>
Date: Wed Feb 16 2005 - 16:28:09 GMT
Hi All!

I have the same problem, such disscution
http://www.swish-e.org/Discussion/archive/2004-10/8357.html.

I use Swish-e 2.4.3 on Linux and index files with -fs and DirTree.pl method.

PDF-Files are indexed correctly, but in search results <swishtitle> property
displays PDF-Filenames instead title of PDF-Files.
The same problem is with word and excel files. Though HTML-Files are
displayed with title.


My Config-File for indexing:
#############################################
#  Swish-e config to index Intranet files   #
#############################################

IndexDir /srv/www/htdocs/

FollowSymLinks yes
   
IndexName "Test"

IndexDescription "test index file"

IndexFile /home/swishe/swish-e/index/fs.index

FileFilter .pdf /home/swishe/swish-e/lib/swish-e/DirTree.pl

IndexContents HTML2 .htm .html .pdf
IndexContents TXT2 .doc .xls

IndexOnly .htm .html .pdf .txt .doc .xls

ReplaceRules replace "/srv/www/htdocs/" "http://127.0.0.1/"

PropertyNamesDate created_on

PropertyNames title author

Metanames swishtitle swishdocpath swishlastmodified

UndefinedMetaTags ignore

MetaNames automatic

StoreDescription TXT* 200000
StoreDescription HTML* <body> 200000

FileRules pathname contains '/0_'

FileRules filename contains '/0_'

IndexReport 3

ParserWarnLevel 1

#############################################
#               end of file                 #
#############################################

The results of indexing:
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 95,365 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: Complete
  Writing word hash: Complete
  Writing word data: Complete
95,365 unique words indexed.
8 properties sorted.
2,596 files indexed.  138,964,590 total bytes.  1,778,584 total words.
Elapsed time: 00:03:17 CPU time: 00:00:24
Indexing done!

The command "siwsh-filter-test "swish-filter-test -content ../test/foo.pdf"
gives:
"<meta name="title" content="PDF-File-Title">"

The command "swish-e -f /home/swishe/swish-e/index/fs.index -w
PDF-File-Title -x '<swishtitle>\n'" gives though:
"foo.pdf"

Have anyone an idea ???
Thanks in advance
Leonard Scheermann
Received on Wed Feb 16 08:28:10 2005