Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Current favorite PDF filter on FreeBSD?

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Thu Sep 02 2010 - 03:30:03 GMT
David Brown wrote on 9/1/10 8:26 PM:

> Side question: I thought I read somewhere that swish-e's HTML parser will
> weight text in headings more heavily than in regular text, but I'm not
> finding that in the documentation. Is this in fact the case? If not, then I
> might as well just stick with pdftotext.  If it does, then I'll try harder
> to get pdftohtml 0.40a compiled or look into any alternatives you all might
> suggest.

you can limit or weigh matching terms based on their context (MetaName). But the
parser doesn't do anything special by default.

fwiw, I use pdftotext and pdfinfo via the SWISH::Filters::Pdf2HTML filter.

http://cpansearch.perl.org/src/KARMAN/SWISH-Filter-0.15/lib/SWISH/Filters/Pdf2HTML.pm
-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed Sep 1 23:30:07 2010