Skip to main content.
home | support | download

Back to List Archive

(meta)data API needed for swish-e?

From: Dan Brickley <Daniel.Brickley(at)not-real.bristol.ac.uk>
Date: Sat Aug 21 1999 - 21:17:48 GMT
On Sat, 21 Aug 1999, David Norris wrote:

> > Are there plans to add timestamps to the database?
> > UNIX time since epoch would be enough...
> > Any hints?
> 
> Well, I see one potentially confusing problem with that.  As the index becomes outdated, the
> timestamps will become outdated as well.  This may or may not be a problem, of course.
> 
> I don't see any trouble with the search script gathering timestamps from the filesystem.  It
> really isn't a speed or technical issue as it is fast and simple.  My search results rarely
> takes more than 30 - 50 milliseconds to generate results while reading several parameters,
> including timestamp and various metadata fields, from the file system.  I consider that a bit
> slow however not unreasonable.
> 
> This would be useful if the files aren't locally stored, though.  HTTP I/O has tremendous
> latency.  If you're willing to have a few incorrect modification dates, then it would be useful.
> 
> A problem I see with adding this and that to the index; when does the index file begin to
> approach or exceed the total size of the files indexed?  If everything is in the index, then why
> even have them in the file system.  Convenience?

An intermediate approach would be to pull this (meta)data from the
filesystem but cache it in an intermediate store accessible through some
common API, so the decision as to whether it was indexed or left in the
filesystem was hidden from application authors.

I'm thinking for example of being able to go
my $title = $metadatadb.getPropertyValue("file:/docs/doc1.html","DC:Title");

...and so forth, regardless of whether the data is inthe index or
filetree. 

Actually I'm hinting at the use of an RDF-based API here though that
might be overkill for some of you! If there is Perl code floating
around that would get the title/keywords/properties/timestamp and
other metadata for a given file given a filename (either from the
filesystem or from the index file), I'd be more than willing to spend a
bit of time wrapping it in an RDF API, and we've got some code here
already that might help with such a thing. It's been a while since I
last looked at the SWISH-E distribution so any 'getting started'
pointers with this would be much appreciated.

cheers,

Dan
Received on Sat Aug 21 14:12:40 1999