Skip to main content.
home | support | download

Back to List Archive

Probs with xml-marc format

From: Thoreau Lovell <tlovell(at)>
Date: Thu Feb 12 2004 - 22:59:17 GMT

  I'm trying to use Swish-e for a project in the Library at San Francisco 
State and am beginning to wonder if it is the right tool. Sorry for the 
long post, but I think some background will be helpful.

We get a list of Journals for which we have online access to fulltext 
articles from a vendor in either html or xml. We're talking, say 20 - 
40,000 journals. The list is exported as separate docs for each letter of 
the alphabet, where A--.html has all the journals that start with the 
letter "A". I've indexed both the HTML and XML versions of the files and 
can search them using the Swish.cgi program.

The problem is how the found set is returned. Searching for American 
Chiropractor, for instance, tells me that the journal is found in A--.html. 
But I can't get Swish-e to return any of the more useful data elements: 
Journal title, ISSN, Coverage, Source, which are all present in the indexed 
files. This seems like a situation where the structured nature of XML 
should be useful, so I've focused on working with XML Docs. One problem may 
be that the format the vendor uses is xml-marc, which seems to give Swish-e 
some trouble. Here's a snippet of what the data looks like:

-<datafield tag="022" ind1="" ind2="">
         <subfield code="a">0194-6536</subfield>
-<datafield tag="245" ind1="" ind2="4">
         <subfield code="a">The American chiropractor</subfield>
-<datafield tag="210" ind1="" ind2="">
         <subfield code="a">AMERICAN CHIROPRACTOR</subfield>
-<datafield tag="090" ind1="" ind2="">
         <subfield code="a">110978978735405</subfield>
-<datafield tag="866" ind1="" ind2="">
         <subfield code="x">Alt-HealthWatch:Full Text</subfield>
         <subfield code="a"> Availability: from 1998</subfield>

I've experimented with XMLClassAttributes and UndefinedXMLAttributes, 
without much luck.

What I'd like is to see is a search result like this:

         Alt-HealthWatch:Full Text
         Availability: from 1998

Am I barking up the wrong tree trying to get this to work with Swish-e?



Thoreau Lovell
Digital Systems Design and Development Coordinator
J. Paul Leonard Library, San Francisco State University
415-338-2285 | 
Received on Thu Feb 12 14:59:18 2004