Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] indexing performance expectations

From: Patrick May <patrick(at)not-real.hexane.org>
Date: Tue Jul 22 2008 - 23:51:31 GMT
Oh and I am using the file system method, with a directory of special
formatted intermediate files.  My experience with the the prog method has
been that it put too much work onto the index process and proved unreliable
for large amounts of content.

Thanks again for your help!

Cheers,

Patrick



On Tue, Jul 22, 2008 at 7:49 PM, Patrick May <patrick@hexane.org> wrote:

> Brad and Peter,
>
> Thanks for the feedback.  It does indeed take ~ 2h to index all my
> documents.  Thanks also for the suggestion to use intermediate directories
> and do a merge.  I can see that trimming the index time down quite a bit.
>
> Cheers,
>
> Patrick
>
>
>
> On Sun, Jul 13, 2008 at 6:43 PM, Peter Finch <PFinch@cch.com.au> wrote:
>
>>  Hi Patrick,
>>
>>
>>
>> We have a similar problem; we have about 900,000+ documents at over
>>
>> 4GB.
>>
>>
>>
>> Fortunately for me the documents are grouped into directories and I
>>
>> only reindex the groups that change into a "intermediary" index (I
>> actually
>>
>> use a Makefile to detect which directories were updated). Then I merge
>>
>> all the intermediary indexes into the final index. It still takes a
>>
>> while (~1 hour on a sparc V210) but it's faster than doing it all from
>>
>> scratch.
>>
>>
>>
>> On average it's faster to merge, however, if everything changes then it
>>
>> actually takes longer... fortunately, that does not happen very often.
>>
>>
>>
>> Also, be careful in the number of "intermediary" indexes as Swish can
>>
>> only merge a few dozen at once.
>>
>>
>>
>> I hope this helps.
>>
>>
>>
>> Regards,
>>
>> Peter Finch
>>
>>
>>  ------------------------------
>>
>> *From:* users-bounces@lists.swish-e.org [mailto:
>> users-bounces@lists.swish-e.org] *On Behalf Of *Patrick May
>> *Sent:* Saturday, 12 July 2008 12:26 AM
>> *To:* users@lists.swish-e.org
>> *Subject:* [swish-e] indexing performance expectations
>>
>>
>>
>> Hello,
>>
>> How should I expect indexing to perform when indexing 900,000+ very small
>> documents (256 Mb)?  Thus far, my observation is that it takes a while.
>> Could it be helpful to move to an incremental format?
>>
>> Cheers,
>>
>> ~ p
>>
>>
>> --
>> Patrick May
>> 135 Oak Street
>> New York, NY 11222
>> +1 (347) 232-5208
>> patrick@hexane.org
>> http://www.hexane.org
>>
>> _______________________________________________
>> Users mailing list
>> Users@lists.swish-e.org
>> http://lists.swish-e.org/listinfo/users
>>
>>
>
>
> --
> Patrick May
> 135 Oak Street
> Brooklyn, NY 11222
> +1 (347) 232-5208
> patrick@hexane.org
> http://www.hexane.org
>



-- 
Patrick May
135 Oak Street
Brooklyn, NY 11222
+1 (347) 232-5208
patrick@hexane.org
http://www.hexane.org


_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Jul 22 19:51:36 2008