To use StoreDescription for XML, you need to give a tag in the XML from
which to extract the description text; and the same is true for storing
descriptions of HTML. This makes sense for HTML (which is a single
standard where you can use e.g. <body> as the StoreDescription tag), but
doesn't seem to for XML (which is extensible and thus you define your own
tags and format). I.e. the files you are indexing could contain many
different types of XML files and there will be no single XML tag that they
all share which could be used as the common StoreDescription tag. So it
seems StoreDescription should be changed for XML files to either allow
entire (up to some number of characters, as TXT descriptions are
specified) XML files to be stored or to allow multiple tags to be
specified. Is there any way to get around this in the current Swish-e to
store entire XML file contents as descriptions?
Finally, this has probably been asked, but is there a Linux filter to use
for filtering and indexing MS Powerpoint files (i.e. something like
pdftotext for pdf)? I haven't been able to find a good free one, and was
thinking of just using the "strings" command to extract printable strings
from a file, but just want to know if there is anything better.
thanks,
Andrew Smith
Received on Thu Jan 23 23:03:53 2003