> All released versions of SWISH-E support index files up to 2 GB.
> The index file size will depend completely on what you choose to store
> in the index. My only advice is to test SWISH-E with your data.
Indeed. Though my current project is more suited for Nutch, I'm still
using SWISH-E for proof of concept and for early adopter type users. I
have a couple (of six) indices of just over a million pages total that are
near the 2GB limit and found out the hard way about the limit and how many
files can be in an index. ;-)
Basically the more non text or non html docs (pdf, xl, doc) and the larger
the description text the bigger the file. I worked around the file size
and crawl duration limitations (some crawls are 120 hrs plus) by
segmenting the indices by content type, sort of poor man's DMOZ style of
SWISH-E works very well in it's intended application. I've got both the
current snap of Nutch and the 2.4.0 release of SWISH-E making the same
crawls for comparison sake. Two vastly different tools for different
Received on Wed Jan 7 09:43:00 2004