I am long time user of swish, and greatly encouraged by the incremental
I do alot of spidering to map how websites link to each other (database is
over 1B records), and i want to keyword index as well.
ideally i would like to do:
spider.pl -c config.file http://www.somewebsite.com | swish-e -S prog -i
where the idea is to not spider to disk the mirror copy, but to be able to
directly pump into swish with the incremental index, such that i could have
200 of these command lines running, indexing to their indidividual 200 .idx
at the end of the day, i would merge the 200 .idx files into 1 daily index
i tried a fews ago to use the experimental incremental indexing, but i
couldn't get it to all work as described above.
any pointers would be greatly appreciated.
Received on Tue Oct 10 09:56:38 2006