Skip to main content.
home | support | download

Back to List Archive

indexing files with multiple MIME parts

From: Andy Jacobson <andyj(at)not-real.splash.princeton.edu>
Date: Tue Aug 17 2004 - 14:41:58 GMT
Hi,

        I've been using swish-e to index my email for some time now,
        but without paying any attention to the binary MIME
        attachments that the messages contain.  Now I would like to
        index the MS-Word and PDF attachments as well.

        I've written a perl script that uses MIME attachment
        processing code from CPAN to extract the attachments and hand
        them off to SWISH::Filter for filtering.  So far, so good; all
        the parts are processed properly and swish-e inputs are
        produced.  

        However, each MIME message will produce at least two
        parts, with different content types.  The email text will be
        text/plain, but perhaps the filtered PDF will be HTML.  So one
        email would produce multiple outputs, something like:

Path-Name: ./1964
Content-Length: 1001
Last-Mtime: 1092715695
Document-Type: TXT*

text text text ...

Path-Name: ./1964
Content-Length: 49099
Last-Mtime: 1092753193
Document-Type: HTML*

<html> ..... </html>

       Can swish-e handle this?  Two separate inputs for the same
       file?  Can those outputs be of different content types?  I
       suppose the laternative is to attempt to convert everything to
       text/plain, combine content lengths, and feed swish-e just one
       input per file.

       Thanks,

                Andy
-- 
Andy Jacobson

andyj@aos.princeton.edu

Program in Atmospheric and Oceanic Sciences
Sayre Hall, Forrestal Campus
Princeton University
PO Box CN710 Princeton, NJ 08544-0710 USA

Tel: 609/258-5260  Fax: 609/258-2850
Received on Tue Aug 17 07:42:18 2004