Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Indexing page and chapter of a book (single document)

From: David Brown <dave(at)not-real.davidhbrown.us>
Date: Tue Jan 12 2010 - 15:12:11 GMT
Peter observed: " disk space is less a concern than it was even 5 years ago
"

Unless, of course, you happen to still be hosting your client's site with
the same virtual server provider you were 15 years ago which figured out
they could offer unlimited bandwidth if they gave you very limited disk
space :-) 

But for something relatively unchanging, the cached file approach makes good
sense; you could probably even build the index on a development machine and
copy the files to deploy. 

Dave
--
Dave Brown
dave@davidhbrown.us

-----Original Message-----
From: users-bounces@lists.swish-e.org
[mailto:users-bounces@lists.swish-e.org] On Behalf Of Peter Karman
Sent: Sunday, January 10, 2010 12:14 AM
To: Swish-e Users Discussion List
Subject: Re: [swish-e] Indexing page and chapter of a book (single document)

David Brown wrote on 1/9/10 7:45 AM:
> Quick thought for you. if you can add the XML to the files, you should 
> be able to write a program that uses those tags to present each chapter 
> (or even page) individually to swish-e while indexing (programmatically, 
> not via the file system) and still refer to the location of the 
> composite file, possibly even using anchor tags (mybook.htm#chap3 for 
> example).

Dave has the right idea.

swish-e can only report properties (page, chapter, etc) per document. Each
"hit" 
represents exactly one document.

So your best bet is to break each file into virtual "pages" and store the
page 
and chapter values in meta tags. You could do that with the -S prog input
method 
and a script that parses each file and generates the appropriate xml output.

IME, it's easiest to actually write the xml output to a cached file and then

index with the -S fs method (i.e., create actual files rather than virtual 
ones). Caching them as real files makes it easier to debug and then reindex
when 
necessary. Terabyte drives being as cheap as they are these days, disk space
is 
less a concern than it was even 5 years ago.

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Jan 12 10:12:24 2010