Re: Using swish-e with one structured document

From: Richard Lewis <richardlewis(at)>
Date: Thu Jul 27 2006 - 14:22:10 GMT
On Thursday 27 July 2006 15:09, Michael Peters wrote:
> How large are these files and how many individual files would you create
> from them? 
Um, well its the whole Vulgate bible! And the granularity I want is 
the "verse" level. I currently have one document per book. So the number of 
files would be the same as the number of verses in the bible. Which is quite 
a lot!

In fact, my data is in two broad sets: that Vulgate bible plus a series of 
encoded liturigcal MSs. These are already split into separate files (~7000, 
each ~500-1000 bytes). In an ext3 filesystem these take up about 90MB, in an 
ISO image they take up about 9!

> > The other thing I've just thought of is using the -S prog
> This is the approach I would take. If something that you want to index does
> not fit into swish-e's model very well (in this case 1 document == 1 hit)
> then filters are a good place to look. By running your files through a
> filter first, you can rearrange them into what ever you want.
> I would probably add a custom tag to whatever chunks you're spitting out to
> swishe that indicated how to find that chunk again in the document it came
> from (just the filename and xpath expression would probably work) and then
> use that in your results.

Yes, this is what I thought I'd try. Thanks for reassuring me that this course 
is worth pursuing!

Richard Lewis
Sonic Arts Research Archive
Received on Thu Jul 27 07:22:12 2006