thanks very much...this part is more or less solved, though still getting
these errors for .doc and .pdf files (I presume bec they were not
originally .html files):
error: htmlParseEntityRef: no name
I have added this filtering:
FileFilter .pdf pdftotext "'%p' -"
FileFilter .doc catdoc "-s8859-1 -d8859-1 %p"
FileFilter .xls xls2csv "-s8859-1 -d8859-1 %p"
Thanks
Michael
Dr Michael Daly wrote on 3/14/12 9:14 PM:
> The funny thing is that *no* Filefilter options are specified in my
> swish1.conf:
>
> IndexOnly .htm .html .txt .doc .pdf .xls
> IndexContents TXT* .txt
> DefaultContents HTML*
>
> I can see both /opt/bin/catdoc and /opt/bin/pdttotext , with /opt/bin
> being in $PATH so I presume there must be some hard coding within
> swish-e
> that picks them up without the configuration of eg FileFilter
>
> Should these directives be added?:
> FileFilter .pdf pdf2html
> FileFilter .pdf pdftotext "'%p' -"
> FileFilter .doc /opt/bin/catdoc "-s8859-1 -d8859-1 %p"
>
> If not, can the parsing errors be ignored?
>
swish-e is trying to parse your .pdf as HTML, because you've not specified
a
filter. You must specify a filter for anything that is not txt, html or
xml.
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users(at)not-real.lists.swish-e.org
http://lists.swish-e.org/listinfo/users
_______________________________________________
Users mailing list
Users(at)not-real.lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Mar 15 2012 - 13:15:25 GMT