On Thursday 27 July 2006 15:09, Michael Peters wrote:
> How large are these files and how many individual files would you create
> from them?
Um, well its the whole Vulgate bible! And the granularity I want is
the "verse" level. I currently have one document per book. So the number of
files would be the same as the number of verses in the bible. Which is quite
In fact, my data is in two broad sets: that Vulgate bible plus a series of
encoded liturigcal MSs. These are already split into separate files (~7000,
each ~500-1000 bytes). In an ext3 filesystem these take up about 90MB, in an
ISO image they take up about 9!
> > The other thing I've just thought of is using the -S prog
> This is the approach I would take. If something that you want to index does
> not fit into swish-e's model very well (in this case 1 document == 1 hit)
> then filters are a good place to look. By running your files through a
> filter first, you can rearrange them into what ever you want.
> I would probably add a custom tag to whatever chunks you're spitting out to
> swishe that indicated how to find that chunk again in the document it came
> from (just the filename and xpath expression would probably work) and then
> use that in your results.
Yes, this is what I thought I'd try. Thanks for reassuring me that this course
is worth pursuing!
Sonic Arts Research Archive
Received on Thu Jul 27 07:22:12 2006