I'm using the latest version of Swish-e and I have it working fine,
but I am wondering if and how Swish-e has any support for parallelism
and multiprocessors, in particular both for indexing and searching.
For indexing, I could just handle it myself via the prog input method
(i.e. just fork parallel processes which each independently index part
of a directory tree, e.g. each process is given a number N and indexes
1/Nth of the documents). Then I could merge the indexes at the end (or
just pass them all to Swish-e using the -f option when searching) But
it would be easier if I could just do this via the simple file system
index method; is there any configuration option where you can specify
that Swish-e only indexes every Nth file it encounters?
Next, for searching can Swish-e take advantage of parallelism? For
example, does it know it is running on a multiprocessor and internally
execute the search in parallel? If not, again, I could conceivably
handle this myself as follows. If I want to search in parallel on,
say, 8 processors I would create 8 separate indexes as above, each
covering 1/8th of the files in the corpus of documents to be searched.
Then when searching I fork 8 processes where each one independently
searches one of the 8 separate indexes. Finally, I collate the results
of each of these 8 parallel searches into one final result set. Would
this work? Or would it somehow screw up relevance ranking since the
indexes are being searched independently?
In general, I'd like to hear about any support Swish-e has for high
performance parallel/multiprocessor execution.I appreciate any help
you can give.
Users mailing list
Received on Fri Mar 13 17:20:15 2009