Skip to main content.
home | support | download

Back to List Archive

Re: Swish-e and OpenDocument and metadata

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Nov 04 2005 - 14:53:45 GMT
On Fri, Nov 04, 2005 at 04:56:46AM -0800, Lars D. Noodén wrote:
> Does the filter need to return the real mime type[1]?

All this could be improved vastly.  The filters register regular
expressions of mime types they handle.  If the incoming document
matches the filters mime type then the filter is passed the incoming
content.

The filter would then convert it into another mime type (normally that
means into text/{html|xml|txt} ) and returns that text.

In the case of the spider (that uses SWISH::Filter for its filtering
needs) if the returned document is of type text/* then the document is
passed onto swish.

So, yes, you would need to set the mime type after filtering.

Seems like looking at the existing filters might explain it better.

> Can swish-e process two separate XML files (content + metadata) as one
> if they are concatenated?

If the resulting document is a valid xml file.  You might want to use
on of the xml parsers to merge the documents correctly.

One things the filters are not setup to do is to take a single file
(like a tar or zip file) and then index those as separate files.  It
should do that, but it doesn't.

That can easily be hacked with spider.pl because swish is connected to
stdout all you have to do is correctly format the document (add a few
headers) and send it to stdout and it will get indexed.

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Fri Nov 4 06:53:45 2005