Skip to main content.
home | support | download

Back to List Archive

Re: performance aspects

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu May 27 2004 - 13:33:06 GMT
On Thu, May 27, 2004 at 12:29:28AM -0700, swishe wrote:
> But we've noticed the following behaviour:
> When doing a boolean search like "key1=value1 AND key2=value2" it seems that
> swish-e is doing the key2=value2 search even if the first part of
> this query (key1=value1) does not find any results.

Yes, that's likely true because the parser currently isn't smart enough
to look ahead in the query.

> Would it be possible to fix this?

Yes, that is the plan.  (You can look back at the archives and see how
often the topic of updating the parser is discussed.)

> A "full table scan" looking for words beginning with 1 or 2 specific chars
> takes a long time. Our word index contains in about 30 million items.

Do you mean wild card searches like foo*?  Yes, those are slow because
the hash table for wildcard searches is only 256 elements long and with
your large data set there's a lot of walking the index.

Do any of the big search engines have wild card searches?

> It would be perfect if in this example the search for the second part
> could be reduced to records found by the first condition "key1=<string>". 
> Or in other words: it would be perfect if swish-e could JOIN results from 
> left to right. 

I don't know how that could be done, but if you have any ideas.  Swish
only has an inverted word index -- you can't limit a search to a subset
of words based on file number.  The index would need to be redesigned
or additional tables would need to be created at search time to limit by
file numbers.

> Or again in other words: do you use a query optimizer and if so, how
> does it work?

There's no optimizer.  Swish-e was originally written for indexing web
sites -- a few thousand pages typically.

Swish-e users would welcome any help from someone experienced in query parsing and
optimizing.

> My second question concerns another performance aspect:

That was more than one question!

> We are using properties. So, swish-e generates two big files,
> index.swish-e and index.swish-e.prop
> In our application index.swish-e.prop is twice as big as index.swish-e.
> Would it speed up the search if we would copy one of these files
> or both into a ramdisk?

I doubt it.  But you could try and let us know.  Put index.swish-e in
ramdisk first as it's where most of the seeking is happening during
searches (so I assume) -- the .prop should just be used for generating
results which should only be a page at a time.

But, I would suspect that the OS would be keeping the index file in RAM
anyway.

-- 
Bill Moseley
moseley@hank.org
Received on Thu May 27 06:33:07 2004