On Apr 8, 2008, at 7:11 PM, Peter Karman wrote:
>> OTOH, it seems that there are repeated inquiries on the list about
>> how to insert meta data about the document into the index. Often we
>> know things about the document that are not included in the document
>> itself, and it seems that an extension of the existing filtering
>> mechanism might be useful.
> see URL above. That version of SWISH::Filter needs to get merged
> back into the
> Swish-e dist. It definitely will in 2.6; not sure if it will in 2.4.x.
hmm. I've just about finished hacking spider.pl to add another user-
defined callback function to allow me to insert the additional
attributes into ALL documents, including the TEXT/HTML types that are
normally not filtered.
But it looks like the meta_data() method would allow me to instead
build a filter that inserts the attributes as meta data. I take it
need to update the filters (such as pdf2html) to use set_continue, so
that after type conversion, my attribute_insertion filter gets called?
Users mailing list
Received on Tue Apr 8 22:36:24 2008