Skip to main content.
home | support | download

Back to List Archive

Re: swish-e 2.4.3 windows 2003 iis success!

From: Bill Moseley <moseley(at)>
Date: Wed Jun 22 2005 - 21:46:37 GMT
On Wed, Jun 22, 2005 at 05:05:53PM -0400, Revillini, James wrote:
> RTF's are killing it now.  As soon as it runs into one, the output file
> from goes like this:

By the way, this is all in the docs, but here's a quick executive
summary: finds files and then passes the file name to SWISH::Filter

SWISH::Filter uses MIME::Types to lookup the mime type of the file.
Then all the available SWISH::Filter modules are scanned for a regular
expression that matches the file's mime type.  When found that filter
is used and the filter changes the content type to something else
(like text/plain or text/html).

The individual filters normally need helper programs, like catdoc, to
be installed before they will work.  The swish distribution on windows
includes catdoc, IIRC.

When SWISH::Filter is done then skips any files that are
"binary", which only means they are not of some kind of text/* type.
Really, it should only not skip if text/xml, text/plain, or text/html
as that's all swish can index.  After all there's a lot of other text

    $ fgrep 'text/' /etc/mime.types | wc -l

You might want to add that test into -- check for only
those three mime types:

    unless ( $doc->content_type =~ m!^text/(?:plain|xml|html)$/ ) {
        warn "Can't index $path because it's " . $doc->content_type .  "\n";

Anyway, that's how it all works.

Bill Moseley

Unsubscribe from or help with the swish-e list:

Help with Swish-e:
Received on Wed Jun 22 14:46:38 2005