David Brown wrote on 9/1/10 8:26 PM:
> Side question: I thought I read somewhere that swish-e's HTML parser will
> weight text in headings more heavily than in regular text, but I'm not
> finding that in the documentation. Is this in fact the case? If not, then I
> might as well just stick with pdftotext. If it does, then I'll try harder
> to get pdftohtml 0.40a compiled or look into any alternatives you all might
> suggest.
you can limit or weigh matching terms based on their context (MetaName). But the
parser doesn't do anything special by default.
fwiw, I use pdftotext and pdfinfo via the SWISH::Filters::Pdf2HTML filter.
http://cpansearch.perl.org/src/KARMAN/SWISH-Filter-0.15/lib/SWISH/Filters/Pdf2HTML.pm
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed Sep 1 23:30:07 2010