Re: [swish-e] Filters

From: Peter Karman <peter(at)>
Date: Mon May 12 2008 - 17:58:27 GMT
On 05/12/2008 11:25 AM, Francisco M. Vives wrote:
> Hi Guys,
> Is there a place with all the available filters to use with SWISH-E?
> I need to know all the types of files that can be filtered in that way, 
> for example, is there any filter that performs OCR on jpg files that can 
> be used with SWISH-E?
> One more thing, how flexible is SWISH-E with filters when using 
> different versions of the filters?
> Last time I tried to index a PDF document I got some erros with the 
> filter that seemed to be happen because the filter didn't work for that 
> version of PDF. So, what happens if I try to use the newest 
> pdftotext.exe and where can anybody get the updated versions of the filters?


Judging from your question about pdftotext.exe, I assume you are using Windows. I believe
all the available filters are included with the Swish-e Windows installer. The list is
fairly modest. Look at the filters/swish-filter-test script in the distribution.

As far as OCR on jpg files, I do not of any filter for that. The Swish-e filters just use
other, 3rd party programs to normalize non-text formats into something txt/html/xml for
Swish-e to parse. So if you can find a piece of software that does the OCR, you can write
a filter for it for Swish-e.

