Specifications for an OLAC metadata display format and an OLAC-to-OAI_DC crosswalk

Date issued:2002-08-10
Status of document:Draft Informational Note. This is only a preliminary draft that is still under development; it has not yet been presented to the whole community for review.
This version:http://www.language-archives.org/NOTE/olac_display-20020810.html
Latest version:http://www.language-archives.org/NOTE/olac_display.html
Previous version:None.
Abstract:

Specifies OLAC_Display, the OLAC metadata display format implemented by the OLAC Aggregator service. This format is a reader-friendly view of OLAC metadata that icorporates attribute values into the element content and translates coded values into display labels. The document further specifies the transformation from OLAC_Display format to OAI_DC format.

Editors: Gary Simons, SIL International (mailto:gary_simons@sil.org)
Changes since previous version:
Copyright © 2002 Gary Simons (SIL International). This material may be distributed and repurposed subject to the terms and conditions set forth in the Creative Commons Attribution-ShareAlike 2.5 License.

Table of contents

  1. Introduction
  2. Display format strategy
  3. Element transformations
  4. An OLAC-to-OAI_DC crosswalk
References

1. Introduction

In order to improve recall and precision in searching, the OLAC metadata format [OLACMS] uses attributes to support resource description using controlled vocabularies. Service providers may use these attributes to perform precise searches. However, service providers also need to be able to display metadata records to users in a manner that shows all available information in an easy-to-read form. Not only does this involve combining attributes with element content to produce a display of all information pertaining to a metadata element, but it also requires that coded attribute values (such as three-letter language codes) be translated into friendly display forms.

Transforming OLAC metadata records into such a display format is a non-trivial task that each service provider should not have to implement independently. Thus the OLAC Aggregator [OLACA] offers such a translation service. It supports a metadata format named OLAC_Display. When metadata are harvested using this metadata prefix, the content of the metadata elements contains a reader-friendly view of all the information associated with the element. For instance,

http://www.language-archives.org/cgi-bin/olaca.pl?
   verb=GetRecord&metadataPrefix=olac&identifier=oai:ethnologue:AAA

will retrieve the metadata in OLAC format as specified in [OLACMS], whereas

http://www.language-archives.org/cgi-bin/olaca.pl?
   verb=GetRecord&metadataPrefix=olac_display&identifier=oai:ethnologue:AAA

retrieves the same metadata record in the reader-friendly form specified in this document.

In order to participate in the wider Open Archives Initiative (OAI) community of service providers, OLAC data providers must publish their metadata records in both the OLAC format and the Dublin Core format prescribed by the OAI [OAI_DC]. There is no need for data providers to store the records in both formats, however, since the OAI_DC format is a subset of the OLAC format. An OAI_DC record may thus be automatically derived from an OLAC record. A program that transforms a metadata record from one format to another is conventionally called a "crosswalk"; see [Day2001] for other examples of crosswalks and pointers to discussions of crosswalking issues.

It turns out that implementing an OLAC-to-OAI_DC crosswalk involves the same kind of transformation of attribute values that is involved in generating the reader-friendly OLAC_Display format. The final section of this paper describes additional transformations performed by the OLAC Aggregator to achieve an OLAC-to-OAI_DC crosswalk. In addition to documenting the transformation made by the community's centralized OLAC-to-OAI_DC crosswalk, this note can be used as a specification by those who implement an OLAC-to-OAI_DC crosswalk in their own data provider.

2. Display format strategy

The XML schema that implements the OLAC metadata set uses seven devices for recording information:

  1. The basic DC metadata element name (e.g. format in <format> and <format.markup>)

  2. A metadata element refinement expressed in a compound tag name (e.g. marrkup in <format.markup>

  3. The value of a metadata element expressed as XML element content

  4. The value of a metadata element expressed as the value of the code attribute

  5. A metadata element refinement expressed in a refine attribute

  6. The encoding scheme for the element content expressed in the scheme attribute

  7. The language of the element content expressed in the lang attribute

A straightforward display of OLAC metadata that shows only the element tag and the element content includes only items 1 through 3. But it is critical that item 4 also be displayed, since it is an alternative way of expressing the value of a metadata element. Items 5 and 6 are similarly important because they qualify the meaning of the element value. Only item 7 seems unnecessary as part of the display form of a metadata element.

It is not enough, however, to incorporate the attribute values into a presentation of the element content. This is because the attribute values are typically coded values; the display form must also translate the coded values to display labels. Furthermore, there should be a standard display template that uses punctuation in a consistent way to set off the various pieces of information. Thus, a metadata element like the following,

<element scheme="S" code="C" refine="R">Content</element>

translates to the following display form:

Label-for-S: Label-for-C, Content [Label-for-R]

The strategy for the OLAC_Display format is to provide the attribute information both as coded attribute values and as display strings incorporated into the element content. Thus, the schema for the olac_display metadata format supported by the OLAC Aggregator is identical to the schema for the olac metadata format. In this way, services that harvest records from the OLAC Aggregator in OLAC_Display format can still use the coded attribute values to support high recall and precision in queries, and at the same time have the convenience of all the attribute information being incorporated into the element content in a reader-friendly view.

The next section illustrates the exact transformations made in converting the information recorded by means of items 3 through 5 into annotations in the DC element content.

3. Element transformations

This section illustrates how the content string for the OLAC_Display format is generated from the information in an OLAC element. The discussion is organized in terms of the attributes present in the element:

  1. The OLAC element has no attributes.

    In this case the transformation simply copies the original content into the display form content. For instance,

    OLAC:         <coverage>19th century</coverage>
    OLAC_Display: 19th century
  2. The OLAC element has the code attribute.

    In this case the transformation copies the display label for the coded value into the display form content. If the OLAC element has both a coded value and a free value in the content, a comma is inserted to separate the two. For instance,

    OLAC:         <language code="en"/>
    OLAC_Display: English
    
    OLAC:         <language code="x-sil-ban">Dschang</language>
    OLAC_Display: Yemba, Dschang
  3. The OLAC element has the refine attribute.

    In this case the transformation places the display label for the refinement after the element content in square brackets. For instance,

    OLAC:         <relation refine="hasPart">oai:somearchive:holding126</relation>
    OLAC_Display: oai:somearchive:holding126 [Has part]
    OLAC:         <creator refine="editor">Sapir, Edward</creator>
    OLAC_Display: Sapir, Edward [editor]
    OLAC:         <title lang="x-sil-llu">Na tala 'uria na idulaa diana</title>
                  <title refine="alternative" lang="en">The road to good reading</title>
    OLAC_Display: Na tala 'uria na idulaa diana
                  The road to good reading [alternative]
  4. The OLAC element has the code attribute and the refine attribute.

    Date is the only element that may have both attributes. The transformation treats the code, content, and refinement as in the cases above. For instance,

    OLAC:         <date code="1950">circa</date>
    OLAC_Display: 1950, circa
    
    OLAC:         <date refine="modified" code="1996-10-16"/>
    OLAC_Display: 1996-10-16 [modified]
  5. The OLAC element has a scheme attribute.

    A scheme is typically ignored in making the transformation to the display format. For instance,

    OLAC:         <subject scheme="LCSH">African languages</subject>
    OLAC_Display: African languages

    However, a scheme that has the force of a user-defined refinement may be registered with OLAC along with a display label to be used formatting an element for display. In this case, the label for the scheme is prefixed to the element content with a final colon. For instance,

    OLAC:         <format scheme="audioSamp">8 bit, 22 KHz</format>
    OLAC_Display: Audio sampling: 8 bit, 22 KHz

4. An OLAC-to-OAI_DC crosswalk

The OLAC Aggregator also supports the OAI_DC metadata format. It functions as an OLAC-to-OAI_DC crosswalk since it harvests only OLAC metadata and performs the transformation to OAI_DC on request. Transforming a metadata record from OLAC format to OLAC_Display format goes most of the way toward implementing the OLAC-to-OAI_DC crosswalk. Two further changes must be made to transform OLAC_Display to OAI_DC:

  1. Remove all the attributes.

    This can be done without loss of information since the information in the attributes is already incorporated into the element content.

  2. Move refinements that are part of the tag name into the element content.

    when the OLAC element contains a refinement in the tag itself, this is treated like a registered scheme (as in the final case in the section above). A display label representing the refinement is prefixed to the element content. For instance,

    OLAC:   <format.sourcecode code="Java"/>
    OAI_DC: <format>Source code: Java</format>
    
    OLAC:   <type.functionality>Morphological parser</type.functionality>
    OAI_DC: <type>Software functionality: Morphological parser</type>
    
    OLAC:   <subject.language code="x-sil-ban">Dschang</subject.language>
    OAI_DC: <subject>Language: Yemba, Dschang</subject>

References

[Day2001]Day, Michael. Mapping between metadata formats. UK Office for Library and Information Networking.
<http://www.ukoln.ac.uk/metadata/interoperability/>
[OAI_DC]Dublin Core Metadata Element Set, Version 1.1: Reference Description.
<http://dublincore.org/documents/1999/07/02/dces/>
XML schema for OAI implementation of Dublin Core metadata.
<http://www.openarchives.org/OAI/1.1/dc.xsd>
[OLACA]OLAC Aggregator Service.
<http://www.language-archives.org/cgi-bin/olaca.pl>
[OLACMS]OLAC Metadata Set.
<http://www.language-archives.org/OLAC/olacms.html>