Skip to main content.
home | support | download

Back to List Archive

Re: multiple documents in one file

From: <moseley(at)not-real.hank.org>
Date: Tue Nov 11 2003 - 14:34:19 GMT
On Mon, Nov 10, 2003 at 04:20:26PM -0800, Chris Kantarjiev wrote:
> I have a set of files in roughly this format:
> 
> -----------------------
> Name
> Multi-line Address
> Email
> URL
> 
> [Date] Multi-line body
> -----------------------
> Name
> Multi-line Address
> Email
> URL
> 
> [Date] Multi-line body
> -----------------------
> Name
> Multi-line Address
> Email
> URL
> 
> [Date] Multi-line body
> -----------------------
> 
> .. and so on. I can put together a perl script to return them one
> at a time for swish-e to index, but what isn't clear to me is
> how to tell it that each of these is an individual document. That is,
> when someone gets a hit in a document, I want to return just
> that document, not the entire file.

I'm not clear about your question.  If you look at the swish-e 
documentation search at:

  http://swish-e.org/current/docs/search.html

you can see a search returns links that include a fragment.  That's 
possible because the source document has <a name> tags.

When indexing I just index in sections and add the fragment to the file 
name when indexing.

For your above example, you can index in chunks and just place the data
you want returned in properties for each chunk.  Then you don't have to
access the source file at all during a search.

If the chunks are bigger than you want to save as properties then you 
either need to split the source into individual files, or add some type 
of tag to each chunk indexed (like an offset number) and then in you 
search script somehow seek to that location in the file.


-- 
Bill Moseley
moseley@hank.org
Received on Tue Nov 11 14:35:19 2003