Why does searching write into the index structures? Is it required? Does
it cache the most recent searches in some way?
----- Original Message -----
From: "Bill Moseley" <email@example.com>
To: "Multiple recipients of list" <firstname.lastname@example.org>
Sent: Thursday, February 17, 2005 10:23 AM
Subject: [SWISH-E] Re: Perl API and mod_perl/Incremental
> On Thu, Feb 17, 2005 at 02:37:08AM -0800, Markus Peter wrote:
>> Can I already open the index files in my Apache mod_perl startup script
>> (=before the fork of the children) and it will automatically do the right
> I'm not sure. Searching writes into the index structures, so you are
> going to get a copy of the memory anyway (copy-on-write). Using a
> second mod_perl server with SWISHED (as Peter commented about) might
> be a bit more efficient memory wise since there would be fewer child
> processes running swish. If that's worth the trade-off of running a
> second mod_perl server is something you would have to determine.
> The act of opening the index doesn't use that much RAM. Running
> searches can, though.
> Try opening the indexes in startup.pl and in child fork and report
> back the differences and how you measured it.
>> The other question I have is regarding incremental mode. So far I've
>> been using the traditional mode with cron jobs to update once or twice a
>> day, but I'd really like to convert the search to be "real time". How
>> stable is incremental mode? And "how incremental" is it? Can I use it,
>> to add/modify/remove documents from the search index on the fly, as they
>> are added/modified or is it rather targetted at batch processing a larger
>> number of updates (=merely a better merge)?
> I only know a little about incremental internals.
> It's not really on-the-fly. It uses a different index format -- a
> btree structure that allow updates. Deletions are made by marking
> that the file has zero words total, but doesn't really delete the
> word data. So the index continues to grow. It also means that the
> search engine really finds words from deleted files and then those
> files are checked to see if they have been deleted, and if so,
> not added to the result set.
> It's not really on-the-fly because, although you can add files to an
> existing index, the final stages of indexing are still done every
> time a file is added -- namely the presorted indexes have to be
> rebuilt. I'm not 100% sure, but I suspect there's a time when the
> index is in an unstable state while adding files to the index.
> I tried a commercial search engine once -- I can't remember what it
> was (they kept emailing me for months after the "free trial" so you
> would think I would remember) -- but it truly allowed searches while
> it was indexing. The down side was it took f o r e v e r to run
> indexing, and searches were not that speedy. Yes, I suspect that was
> a trade-off for scalability.
> Bill Moseley
> Unsubscribe from or help with the swish-e list:
> Help with Swish-e:
Received on Thu Feb 17 07:33:23 2005