Best to copy the list on these things, for the sake of those who come
after us.
swishe supposedly wrote on 04/01/2004 01:12 PM:
>>>(1) How can we limit search results to records having date >= <datestring>
>>
>>Does the -L option help at all? It's listed as experimental, but the
>>docs suggest that a range of dates is the intended function.
>
>
> Thank you very much - it worked even if it's not as fast as all the
> other searches we've tested. But I suppose that you have to do something
> like a "full table scan".
glad that worked.
>
>
>>>(2) How can we search combinations of attributes in one subrecord?
>>> e.g. -w attribute1=A attribute2=B
>>> In our tests swish-e also finds record 2 but we only want to get
>>> record 1
>>> cause only there "A / B" is found in one subrecord.
>>>
>>
>>You might try using the -S prog method to split up your subrecords into
>>actual, distinct xml "files". That way each one would be a distinct
>>"file" and could be return that way.
>
>
> OK, but what I really want to do is - speaken in SQL - a join
> of two different types of records (or entities).
> My first entity (record) may have n subrecords (entity 2) - 1:n.
> I'm looking for a way to select records matching a combination
> of record-attributes joined with subrecords matching a combination of
> subrecord attributes (combination means boolean AND).
> Example:
> - records are book titles
> - subrecords contain information about books of a specific library
> e.g. signature, location, field of research (chemistry, physics,
> comp. science etc.)
> Now I want to find books using title keywords or author names etc.
> for a specific field of research at a specific location.
If I'm understand you correctly, I think you have to do two things: make
your XML more descriptive (unique) and perhaps manipulate the results
after you have them. There's no sense of "different types of records" in
a single swish index. You could, I suppose, create multiple indexes of
different kinds (titles.index and info.index) and then merge the
properties back together to form a virtual table.
But I think you'd probably be better off doing that with a real SQL
database. swish is really good at indexing and searching text, but it
assumes relationships between data are consistent through a single set
of documents.
your example---
configuration:
IndexDir .
IndexOnly .xml
IndexContents XML2 .xml
UndefinedMetaTags auto
UndefinedXMLAttributes auto
PropertyNames date attribute1 attribute2
2 xml records:
<record>
<id>1</id>
<date>20040213</date>
... more record elements
<subrecord>
<attribute1> A </attribute1>
<attribute2> B </attribute2>
<subrecord>
<subrecord>
<attribute1> C </attribute1>
<attribute2> D </attribute2>
<subrecord>
</record>
<record>
<id>2</id>
<date>20040115</date>
... more record elements
<subrecord>
<attribute1> A </attribute1>
<attribute2> D </attribute2>
<subrecord>
<subrecord>
<attribute1> C </attribute1>
<attribute2> B </attribute2>
<subrecord>
</record>
--end your example
The docs say this (under PropertyNames):
If Swish-e finds more than one property of the same name in a document
the property's contents will be concatinated for strings, and a warning
issues for numeric (or date) properties.
I understand that as, according to your example XML, if two attribute1
tags appear in a document, there contents are captured together in a
single property. That makes them virtually useless to you.
If I had those XML records as you describe, I might try flipping them
inside out so as to make them smaller and thus more unique.
Take your current <record> number two and split it into two:
<subrecord>
<id>2</id>
<date>20040115</date>
<attribute1> C </attribute1>
<attribute2> B </attribute2>
</subrecord>
<subrecord>
<id>2</id>
<date>20040115</date>
<attribute1> A </attribute1>
<attribute2> D </attribute2>
</subrecord>
that might let you manipulate a little more with swish.
pek
--
Peter Karman - Software Publications Programmer - Cray Inc
phone: 651-605-9009 - mailto:karman@cray.com
Received on Thu Apr 1 11:52:44 2004