> Hi, I recently was trying to set up swish-e 1.1 to index the user web
> pages of this educational site. I generated a swish conf listing all
> the user web directories without any problems, and ran through
> small-scale tests. However, when I try to index the whole set of user
> pages (roughly 2500 directories with varying numbers of indexable files
> in them), it tends to just "stop" after a while. It'll be going
> along speedily, and then will start slowing down more and more. I left
> it running for over 3 days
> at one point, and it had only made it through about 1200 of the user
I did the same thing on my site of some 7000 (at the time) text files,
averaging about 20k each.
> The machine it's running on (the web server) is reasonably fast (Indy
> R4400SC/100) with 96MB of
> RAM and 128MB of swap. The machine isn't running out of swap (although
> it is going into swap by as much as 50-60 megs when it's at 1200
> users) doing this.
I'm using a mere P200 with 64MB ram/128MB swap.
> So, is what I'm trying to do not do-able with swish-e? Does something
> in swish-e's design (maybe it needs to rewrite a big chunk of the index
> file in memory every time it adds something) make it not
> scalable to the level of what I'm doing? And if it does manage to
That's what I would guess, but I haven't dug at the code. I do know that
it slows to a crawl as soon as it runs out of ram.
> index it, would it be too slow in
> searching the index (the size of all the stuff being indexed is probably
> 100 or so megs)?
Search speed only seems to be a problem if people look for words that are
too common. (A good argument for a large stop list. Is there any easy
way to find out the most frequent words in my files by going through
the indexes somehow?)
> Any ideas? I was hoping swish-e would do the trick, since excite for
> web servers had failed abysmally (buggy and always died after a certain
> number of entries).
Make many small indexes, I made about a dozen of them. Then use the
index merge feature to create a larger index. Contrary to documentation,
merging does not use half as much memory as the final index size, twice
as much seems to be the proper value. I know that it kept running out
of memory on my machine -- otherwise idle -- at the very end of creating
a 60mb index. So I have two 30meg ones instead and have the searching use
Sometime in the hopefully near future I'll be adding a lot more ram to
this box, and maybe a second CPU, and will be able to merge those
Received on Tue May 5 12:58:31 1998