Date issued: | 2008-07-28 |
---|---|
Status of document: | Standard. This document describes a standard that is currently followed by OLAC archives and services. |
This version: | http://www.language-archives.org/OLAC/repositories-20080728.html |
Supersedes: | http://www.language-archives.org/OLAC/repositories-20030917.html |
Latest version: | http://www.language-archives.org/OLAC/repositories.html |
Previous version: | http://www.language-archives.org/OLAC/repositories-20080531.html |
Abstract: |
This document defines the standards OLAC archives must follow in implementing a metadata repository. |
Editors: |
Steven Bird, University of Melbourne and University of Pennsylvania (mailto:sb@ldc.upenn.edu) |
Changes since previous version: |
This update to the standard describes the version 1.1 revision of the OLAC repository schemas. In addition to changing the version number from 1.0 to 1.1 throughout, the substantive changes are in the OLAC archive description and are as follows: the attribute for currentAsOf is added; the elements for <curator>, <curatorTitle>, and <curatorEmail> are replaced by a single, repeatable <participant> element; and an optional <archivalSubmissionPolicy> element is added. One of these changes manifests itself in a new requirement 2 in the requirements on both static and dynamic repositories, namely, that the person associated with the <adminEmail> must be identified in a <participant> element. Finally, the guidelines concerning relevance and granularity have been revised to define the standard for granularity in terms of shared provenance. (This version also incorporates corrections that were made in response to feedback during the Candidate testing phase.) |
Copyright © 2008 Gary Simons (SIL International and Graduate Institute of Applied Linguistics) and Steven Bird (University of Melbourne and University of Pennsylvania). This material may be distributed and repurposed subject to the terms and conditions set forth in the Creative Commons Attribution-ShareAlike 2.5 License.
References
This OLAC standard on metadata repositories is based on the Open Archives Initiative protocol for metadata harvesting [OAI-PMH]. This document assumes familiarity with the OAI protocol. A metadata repository may take the form of a dynamic repository that implements a CGI interface to query a live database in response to protocol requests, or it may take the form of a static repository that has no interface of its own but is serviced through a static repository gateway [OAI-SR].
An OLAC metadata repository (whether static or dynamic) must answer two special description elements as part of the response to the Identify request. It must:
Supply an OAI identifier description
Supply an OLAC archive description
These elements are described in the next two sections. The final sections of the document describe:
The resource identifiers supplied by an OLAC metadata repository must comply with the OAI specification for the format of OAI identifiers as defined in [OAI-Ids]. The metadata repository must document its compliance with this format by including an <oai-identifier> element within a <description> container in the Identify response.
The schema for validating an OAI identifier description is found at:
The target namespace is: http://www.openarchives.org/OAI/2.0/oai-identifier
The schema specifies fixed values of oai for the scheme element and : (colon) for the delimiter element. In addition to being valid with respect to the schema, OLAC places these further requirements on the content of the OAI identifier description:
The repositoryIdentifier must be unique among all registered OLAC archives.
The repositoryIdentifier must be based on a registered domain name, typically of the sponsoring institution. A single institution may use subdomain names to distinguish metadata repositories that are internally distinct. A host institution may also use subdomain names to create identifiers for personal repositories.
The sampleIdentifier must be of an existing item in the repository, and not a hypothetical item.
For example,
<description> <oai-identifier xmlns="http://www.openarchives.org/OAI/2.0/oai-identifier" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai-identifier http://www.openarchives.org/OAI/2.0/oai-identifier.xsd"> <scheme>oai</scheme> <repositoryIdentifier>ethnologue.com</repositoryIdentifier> <delimiter>:</delimiter> <sampleIdentifier>oai:ethnologue.com:aaa</sampleIdentifier> </oai-identifier> </description>
The basic Identify request supplies minimal information about an archive, namely, its name, base URL, and administrator email. An OLAC metadata repository must augment the Identify response by including an <olac-archive> element within a <description> container. This element gives additional information that makes it possible for an OLAC service provider to supply its users with a basic description of a participating archive.
The schema for validating an OLAC archive description is found at:
The target namespace is: http://www.language-archives.org/OLAC/1.1/olac-archive
The <olac-archive> element has two obligatory attributes, type and currentAsOf. The type attribute must have one of two values:
type="institutional" indicates that the repository is operated by an institution that is committed to maintaining it in the future, even after the individuals currently associated with it are no longer involved.
type="personal" indicates that the repository is being operated by an individual (or a group of individuals) without the commitment of an institution for maintenance far into the future.
The currentAsOf attribute records the date on which this <olac-archive> description was last updated or, if no changes needed to be made, the date on which it was verified as holding current information. The attribute is obligatory and takes a date in the W3C date format [W3CDTF] which is a ten character string in the following format: YYYY-MM-DD (e.g., 2008-04-19).
These are the elements that occur within an OLAC archive description, listed in the order in which they must appear:
- archiveURL
Optional. The home page of the archive on the Web. It may be omitted only if the archive does not have a web page. This is the home page for human visitors, not the base URL for harvesting.
- participant
Obligatory and repeatable. Use an instance of this element for each of the persons who plays a significant role with respect to the repository. This must include the system administrator whose email address is given in the <oai:adminEmail> element of the Identify response. It should also include the curator of the archive, and may include any others who play some role. Identifying a participant in the archive description has two functions: it provides contact information for the OLAC community and it creates a subscription to the automatically generated report on usage and quality metrics for the archive that is emailed quarterly. Thus anyone at the institution who wishes to receive this report should be listed as a participant.
name The name of the person who is associated in some way with the repository. Use the normal name form (i.e., uninverted).
role The job title of the participant, or a label for the role the person plays with respect to the repository.
The email address for the participant.
- institution
Obligatory. The name of the sponsoring institution (for an institutional archive) or the institution of affiliation (for a personal archive). If the curator of a personal archive has no affiliation, then a value of Unaffiliated should be given.
- institutionURL
Optional. A URL for the home page of the institution.
- shortLocation
Obligatory. A brief statement (not to exceed 50 characters) of the location of the institution or the person providing the metadata following the format "City, Country". Multiple locations may be connected with "and". This information is shown in the location column of the table of participating archives at http://www.language-archives.org/archives.php.
- location
Optional. A single paragraph (of arbitrary length) describing where an archive that houses a collection of physical holdings is located (for instance, include building name, room number, street address). Other information relevant to visiting the collection, such as opening hours or restrictions on access, may also be described. If the archive is purely an on-line repository, do not use this element.
- synopsis
Obligatory. A single paragraph (of arbitrary length) summarizing the purpose, scope, coverage, and so on of the archive.
- access
Obligatory. A single paragraph (of arbitrary length) summarizing terms of access to the materials described in the metadata repository. The statement can describe restrictions on access, licensing requirements, costs, and so on. Individual metadata records should use the Rights element to document such things for particular archive holdings. The purpose of <access> is to broadly characterize the entire archive.
- archivalSubmissionPolicy
Optional. A single paragraph (of arbitrary length) describing the institution's policy toward accepting archival submissions. The presence of this element indicates that the repository is an archive that accepts submissions of materials for long-term preservation. The element content should describe the collection policy of the archive (e.g., what kinds of materials are accepted from whom under what terms) so that a person looking for a place to archive a set of language resources may determine whether it would be appropriate to contact the curator about making a submission. A repository that does not accept materials for long-term preservation must not use this element. All institutions that provide an archival submission policy are listed with their policy statement in a page aimed at assisting those looking for a place to archive language resources: http://www.language-archives.org/submission-policies.php.
For example,
<description> <olac-archive type="institutional" currentAsOf="2008-04-19" xmlns="http://www.language-archives.org/OLAC/1.1/olac-archive" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.language-archives.org/OLAC/1.1/olac-archive http://www.language-archives.org/OLAC/1.1/olac-archive.xsd"> <archiveURL>http://www.ethnologue.com/bibliography.asp</archiveURL> <participant name="Vurnell Cobbey" title="Archives director (acting)" email="archive_dallas@sil.org"/> <participant name="Joan Spanne" title="Database administrator" email="joan_spanne@sil.org"/> <institution>SIL International</institution> <institutionURL>http://www.sil.org</institutionURL> <shortLocation>Dallas, USA</shortLocation> <location>7500 W. Camp Wisdom Rd., Dallas, TX 75236, U.S.A.</location> <synopsis>The SIL International Language and Culture Archives holds works authored or edited by members of SIL International or produced by a publishing unit of SIL. It houses over 13,000 books, journal articles, book chapters, dissertations, and other academic papers about languages and cultures. It also has about 8,000 items written in the languages studied, such as literacy primers, books on basic education topics (health, math, social studies), story books, and translated works. The vast majority of works are published. The materials date from 1935 to the present. </synopsis> <access>Links are given to publications that are directly accessible via the Internet. Recent SIL publications may be purchased from the International Academic Bookstore (Academic_Books AT sil.org), either in paper or in electronic form. Out-of-print SIL publications may be obtained by special order. All materials may be viewed by visiting the Archives by appointment during normal business hours. </access> <archivalSubmissionPolicy>The SIL International Language and Culture Archives accepts submissions from active and retired SIL staff in the areas of language and culture documentation and description, and language-based development. Under some circumstances, the Archives will also accept materials from former staff and persons more casually associated with SIL language work, if such materials relate to research done with the assistance of SIL or its staff, and there is not a more appropriate institution able to accept and curate the materials long-term. Please address any questions to the Archives by sending email to archive_dallas AT sil.org. <archivalSubmissionPolicy> </olac-archive> </description>
A static repository is an XML document that describes the resources made available by a particular institution or individual. It is a convenient way to create a metadata repository for a relatively small collection (say, up to a couple thousand records). Such a document may be created and maintained manually by means of an XML editor. Alternatively, it might be generated periodically by a script that extracts information from an existing database.
The OAI specification for a static repository is given in [OAI-SR]. The schema for validating a static repository is found at:
In addition to being valid with respect to this schema, an OLAC static repository must also:
Include an <oai-identifier> description and an <olac-archive> description in its <Identify> element.
Include a <participant> element within the <olac-archive> description with an email address that exactly matches the <adminEmail> within the <Identify> element.
Contain the following element within its <ListMetadataFormats> element:
<oai:metadataFormat> <oai:metadataPrefix>olac</oai:metadataPrefix> <oai:schema>http://www.language-archives.org/OLAC/1.1/olac.xsd</oai:schema> <oai:metadataNamespace>http://www.language-archives.org/OLAC/1.1/</oai:metadataNamespace> </oai:metadataFormat>
Contain a <ListRecords> element that specifies an attribute and value of metadataPrefix="olac" that contains at least one record, and in which every embedded record has a metadata description that conforms to the OLAC metadata standard [OLAC-Metadata].
A service for validating a repository for conformance to these requirements is found at:
An example of a complete OLAC static repository that conforms to these requirements is found at:
http://www.language-archives.org/OLAC/1.1/static-repository.xml
A dynamic repository is harder to implement since it requires the implementation of a CGI interface for the complete OAI protocol for metadata harvesting [OAI-PMH]. This is necessary, however, when the collection is large and needs to implement flow control to keep protocol responses to a reasonable size. The OAI community considers half a megabyte to be a reasonable response size. If the ListRecords response for all records in a repository would substantially exceed that size, then it may be necessary to implement a dynamic repository with flow control.
The implementation of a dynamic OLAC metadata repository has all the features of a minimal OAI repository implementation (as defined in [OAI-GRI]), except that a dynamic OLAC repository need not support the oai_dc metadata format. This is because the OLAC Aggregator [OLACA] provides that service for repositories that comply with this standard; see [OLAC-Display] for the specification of the olac to oai_dc crosswalk that is implemented by the Aggregator. In fact, unless the institution has reasons of its own to function independently as an OAI data provider, OLAC recommends that a dynamic repository not implement the oai_dc metadata format so that the translation of OLAC metadata to the oai_dc format will be done consistently across the community.
In addition to the requirements of a minimal OAI repository implementation, a dynamic OLAC metadata repository must comply with the following additional requirements.
The Identify response must include an <oai-identifier> description and an <olac-archive> description.
Include a <participant> element within the <olac-archive> description with an email address that exactly matches the <adminEmail> within the Identify response.
The ListMetadataFormats response (when made with no additional parameters) must contain a specification for the olac metadata prefix that declares the schema and namespace for the version of OLAC metadata that is being used. For example,
<oai:metadataFormat> <oai:metadataPrefix>olac</oai:metadataPrefix> <oai:schema>http://www.language-archives.org/OLAC/1.1/olac.xsd</oai:schema> <oai:metadataNamespace>http://www.language-archives.org/OLAC/1.1/</oai:metadataNamespace> </oai:metadataFormat>
When the metadataPrefix argument to ListIdentifiers is specified as olac, the request must respond with at least one record.
When the metadataPrefix argument to GetRecord is specified as olac, the <oai:metadata> element of the response must either be empty (when no OLAC metadata is available for the given identifier) or it must contain an <olac:olac> element that conforms to some version of the XML schema for OLAC metadata [OLAC-Metadata]. That element must contain an xmlns attribute specifying the URI that identifies the namespace for the version of the OLAC metadata schema that is being used.
When the metadataPrefix argument to ListRecords is specified as olac, every <oai:metadata> element in the response must contain an <olac:olac> element that conforms to some version of the XML schema for OLAC metadata [OLAC-Metadata]. Each such element must contain an xmlns attribute specifying the URI that identifies the namespace for the version of the metadata schema that is being used.
When a request is made to register a metadata repository with OLAC, it is first tested for conformance to the requirements listed in the sections above. When these are met, the registration request is reviewed by the OLAC Council (see [OLAC-Process]) before final acceptance. The role of the Council in the registration process is to ensure that all registered archives meet the following guidelines concerning relevance and granularity.
Regarding relevance, in order to be eligible for registration as an OLAC archive:
The metadata repository must catalog language resources.
Regarding the granularity of repositories, a repository is meant to catalog all the holdings of an archive, rather than having separate repositories for each of the collections within an archive. Thus,
A given institution or individual should typically publish the metadata for all its resources in a single repository.
An exception is appropriate when distinct collections are managed in separate databases and because of this require distinct software for implementing separate repositories.
Regarding the granularity of the records in a repository, the basic guideline is this:
A metadata repository should treat resources with a single provenance as constituting a single unit with respect to OLAC metadata and should, therefore, describe them within a single record.
For published resources, the publication unit typically constitutes the appropriate unit for the OLAC metadata record. For unpublished papers presenting findings of research, these closely parallel typical published works, and can be treated at a comparable level in an OLAC metadata record. For primary source materials (e.g., recordings, transcriptions, annotations, notes, data sets), the typical practice of archivists is to gather such materials into collections based on shared provenance—this is, based on having a common origin and history. These collections are then the primary units for description in OLAC metadata records.
See Section 5 of the OLAC Metadata Usage Guidelines [OLAC-Usage], for a more in-depth discussion of the principle of provenance as applied to collections and metadata within the OLAC context.