William M Conlon wrote on 4/8/08 12:43 AM:
> I took a look at the source, and while it's straightforward to
> capture the meta data in extprog.c, feeding these attributes into the
> parser while it's evaluating the document requires the same work as
> doing it in a perl callback, where it's far easier.
>
agreed. See e.g.
http://search.cpan.org/~karman/SWISH-Filter-0.09/lib/SWISH/Filter/Document.pm#meta_data
> OTOH, it seems that there are repeated inquiries on the list about
> how to insert meta data about the document into the index. Often we
> know things about the document that are not included in the document
> itself, and it seems that an extension of the existing filtering
> mechanism might be useful.
see URL above. That version of SWISH::Filter needs to get merged back into the
Swish-e dist. It definitely will in 2.6; not sure if it will in 2.4.x.
>
> To me it would be ideal to be able to feed two streams into swish-e:
> * one stream is the [filtered] content.
> * the second stream consists of document attributes that are not
> contained in the document itself.
>
yes, the current architecture requires that all data be in the 'document' so
assigning arbitrary meta data (to be stored as MetaNames and/or Properties)
requires insertion into the content stream. That's the 2.x paradigm.
> For now, I can take these two streams and merge them before
> indexing. But perhaps the distinction between information in the
> document and information about the document could be worked into your
> Swish3 proposal?
>
Your idea will be implemented in Swish3, and in fact KinoSearch (one of the two
current backend targets) is already designed with the field/value API in mind.
The Swish3 Perl implementation will allow for storing field/value pairs at
indexing time, outside of the 'document' content per se.
So good idea, Bill. ;)
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Apr 8 22:11:11 2008