Glad to see Apple is joining the swish-e ranks. :)
Aaron Levitt wrote on 09/30/2004 10:53 AM:
I ran the indexer with the following command:
> ./bin/swish-e -S prog -c swish.conf.
Can you send along the contents of swish.conf? those might be helpful.
> So, I have the following questions:
> 1. I expect to have over 1,000,000 documents in our archives as things
> progress. Is this pushing the limits of swish-e?
I think there are folks on this list doing in excess of a million docs,
but perhaps in smaller groups, depending on how often they need to be
reindexed. One thing I like about swish-e is the ability to search
multiple indexes simultaneously.
So yes, I think you can do a million, but for admin purposes, you might
want to identify subsets and split them into smaller indexes.
> 2. I have seen the indexer hit my robots.txt multiple times, is there a
> way to check on the progress to see if/when it will finish indexing?
Bill will likely have a better idea than me.
> 3. What should I do regarding the current index process? I'm afraid to
> stop it, because I don't want to have to start the indexing all over
hmm. I'd let it go just for curiousity's sake. But I understand your
concern. Is there a way you could benchmark the index size via the -S fs
method, so you know what you're aiming for? I'm just wondering if you
could identify whether the bottleneck is the spider or really is the
> 4. Do you have any recommendations on what I can do to improve this
Like I said above, splitting up the docs into subsets depending on how
often they need to be indexed can be helpful. It's also a nice way to
limit the scope of a search, just by selecting which indexes are
searched. That way you needed futz with special metanames, etc.
Peter Karman - 651-605-9009 - firstname.lastname@example.org
Received on Thu Sep 30 09:11:48 2004