Skip to main content.
home | support | download

Back to List Archive

Storing Descriptions of html files

From: Timo Haberkern <thaberkern(at)not-real.emedia-office.de>
Date: Fri May 23 2003 - 07:10:03 GMT
Hello again,

i have another question: I index pdf files using pdftohtml (sourceforge 
project) under a windows 2000 system. my Configuration file:

#---------------------------------------------------
#--- SWISH Search Index configuration file
#---------------------------------------------------

#--- Indexing configuration
IndexFile ../../../metadata/search_index/server_x.index
IndexDir E:/easyDocV3/docufiles/server_x/Heizungstechnik/
IndexReport 3

NoContents .jpg .gif .jpeg .png .tif .bmp

#--- File Filter for pdf files
FileFilter .pdf ./filter/pdftohtml.exe "\"%p\" -stdout -q -noframes"

#--- File filter for DOC files (Word)
FileFilter .doc ./filter/antiword/antiword.exe "\"%p\""

#--- File filter for OpenOffice/StarOffice files
#FileFilterMatch "./filter/unzip.exe" "-p \"%p\" content.xml" 
/\.(sxw|sxc|sxg)$/i
#IndexContents XML* .ml .sxw .sxc .sxg

StoreDescription XML* <text:p> 320
StoreDescription HTML <body> 320
StoreDescription TXT 320
#---------------------------------------------------
#--- end of configuration
#---------------------------------------------------


My problem is that no description is shown in the searchresult even if i 
use the -p swishdescription argument.

I took a look at the html result of pdftohtml. It seems that there are 
parameters in the body tag like: <BODY bgcolor="#A0A0A0" vlink="blue" 
link="blue">

Can this produce the problems???

ciao

Timo
Received on Fri May 23 07:10:14 2003