Ooops I have been forgetting to change the subject on these
threads.
On Tuesday 15 April 2003 02:17 pm, Bill Moseley wrote:
> > 5. To update the site quickly I created a second inc. update
> > index. This also gets merged into the larger index in a
> > periodic manner, and after this merge the index doesn't exist
> > until the next update. This would produce an error with your
> > current swish.cgi and swish-e search, since I am asking for
> > an index which doesn't exist. I changed the swish.cgi such
> > that it accepts multiple indexes for searching, but checks
> > on their existence before passing the list onto the swish-e
> > executable, so there is no error. Is this a feature which
> > perhaps should be added to the swish.cgi or even swish-e, if
> > there are multiple index for search, ignore one if it is not
> > there instead of producing an error?
>
> There is code now to add files to an index.
> It has not been tested much, but would work good for archives. Grab cvs
> or a swish-daily and run ./configure --help
>
> So you have something like:
>
> full_index + incremental_index
>
> and once in a while you merge the incremental_index into the full_index
> and then incremental_index no longer exists. Is that the problem?
> I guess I'd stat the file, too. Or maybe create a dummy index for the
> incremental_index with a small file entry with one word. Stat()ing is
> probably faster.
Well, the full index of 150,000 pages turns out to be two files
each 200MB in size, and takes hours to produce. I create a daily
index (which takes a few seconds so that is fine) and then merge
at the end of the day.
During the day I re-create the incremental index every 15mins,
and people are very happy with this. I ask swish-e to search
the two indexes full + incr. But after the merge, this
incr. index doesn't exist for 15mins, so swish-e will produce
an error when trying to search the two indexes.
I could change the config file for those 15mins each day, but
I don't really want to do that. So, I put in a test on the
existance of the index file in the swish.cgi. I will test each
index asked for, and push it into a second array if it exists.
Then feed the second array of index files to swish-e, so there
is no error. Now the config files don't change, and the search
always works.
So, yes, this is just a stat on the index file to see if
it exists.
I worry about the idea of adding single files if this takes
too long. I mean right now re-making the incr. index takes
a couple secs. If the full index is 200MB, will it take less
than a couple sec. to add a page to this index?
Is this "missing incr. index after merge" not a problem for
other sites?
Douglas
--
-----------------------------------------------------------
Douglas A. Smith douglas@slac.stanford.edu
Office: Bld 280, Rm 157 (650)926-2369
-----------------------------------------------------------
Received on Tue Apr 15 22:29:00 2003