lisab(at)not-real.hospitalsoup.com wrote on 11/13/12 3:03 AM:
> Hello, We've been using Swish on one of our servers but I'm new to
> Swish and we will be adding in resume and healthcare policy
> information searching capabilities. I'm not familiar with indexing
> speed requirements when adding larger data sets and was wondering if
> anyone could give me guidelines on how long it may take to index close
> to 500,000 new documents? My plan was to try to schedule the indexing
> on off peak hours but if I had some idea how long that a typical index
> would take with adding in those files then that would be helpful to me
> as I try to get up to speed.
The time-to-index will depend on the size of the documents and how many fields
(MetaNames and PropertyNames) you have defined. Disk I/O is the big bottleneck IME.
I'd suggest profiling your doc set with a smaller number and then extrapolate.
>
> Also, we'd be honored as well to be included in the users page.
> Company is HospitalSoup.com and we provide Hospital Ratings, Reviews
> and HealthCare Information
thanks. added in r3251. Should appear on the site in the next few hours.
>
> I also have to learn how to index Power Point documents but that will
> be for another night's work. Thank you and your tips are most
> appreciated.
>
SWISH::Filter::pp2html and SWISH::Filter::pp2txt both claim to handle PowerPoint
docs. Check the docs for examples.
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users(at)not-real.lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed Nov 14 2012 - 04:50:45 GMT