Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Current favorite PDF filter on FreeBSD?

From: Peter Karman <peter(at)>
Date: Thu Sep 02 2010 - 03:30:03 GMT
David Brown wrote on 9/1/10 8:26 PM:

> Side question: I thought I read somewhere that swish-e's HTML parser will
> weight text in headings more heavily than in regular text, but I'm not
> finding that in the documentation. Is this in fact the case? If not, then I
> might as well just stick with pdftotext.  If it does, then I'll try harder
> to get pdftohtml 0.40a compiled or look into any alternatives you all might
> suggest.

you can limit or weigh matching terms based on their context (MetaName). But the
parser doesn't do anything special by default.

fwiw, I use pdftotext and pdfinfo via the SWISH::Filters::Pdf2HTML filter.
Peter Karman  .  .  peter(at)
Users mailing list
Received on Wed Sep 1 23:30:07 2010