Date issued: | 2003-09-17 |
---|---|
Status of document: | Retired Standard. This document was once adopted by the community, but has now been superseded by a revised version. |
This version: | http://www.language-archives.org/OLAC/repositories-20030917.html |
Latest version: | http://www.language-archives.org/OLAC/repositories.html |
Previous version: | http://www.language-archives.org/OLAC/repositories-20030716.html |
Abstract: |
This document defines the standards OLAC archives must follow in implementing a metadata repository. |
Editors: |
Steven Bird, University of Melbourne and University of Pennsylvania (mailto:sb@csse.unimelb.edu.au) |
Changes since previous version: |
This document was adopted as an OLAC standard on 17 September 2003 by the OLAC Council. During the final review, minor changes were made for the sake of clarification; none of these changed the substance of the previous version. |
Copyright © 2003 Gary Simons (SIL International) and Steven Bird (University of Melbourne and University of Pennsylvania). This material may be distributed and repurposed subject to the terms and conditions set forth in the Creative Commons Attribution-ShareAlike 2.5 License.
References
This OLAC standard on metadata repositories is based on the Open Archives Initiative protocol for metadata harvesting [OAI-PMH]. This document assumes familiarity with the OAI protocol. A metadata repository may take the form of a dynamic repository that implements a CGI interface to query a live database in response to protocol requests, or it may take the form of a static repository that has no interface of its own but is serviced through a static repository gateway [OAI-SR].
An OLAC metadata repository (whether static or dynamic) must answer two special description elements as part of the response to the Identify request. It must:
Supply an OAI identifier description
Supply an OLAC archive description
These elements are described in the next two sections. The final sections of the document describe:
The resource identifiers supplied by an OLAC metadata repository must comply with the OAI specification for the format of OAI identifiers as defined in [OAI-Ids]. The metadata repository must document its compliance with this format by including an <oai-identifier> element within a <description> container in the Identify response.
The schema for validating an OAI identifier description is found at:
The target namespace is: http://www.openarchives.org/OAI/2.0/oai-identifier
The schema specifies fixed values of oai for the scheme element and : (colon) for the delimiter element. In addition to being valid with respect to the schema, OLAC places these further requirements on the content of the OAI identifier description:
The repositoryIdentifier must be unique among all registered OLAC archives.
The repositoryIdentifier must be based on a registered domain name, typically of the sponsoring institution. A single institution may use subdomain names to distinguish metadata repositories that are internally distinct. A host institution may also use subdomain names to create identifiers for personal repositories.
The sampleIdentifier must be of an existing item in the repository, and not a hypothetical item.
For example,
<description> <oai-identifier xmlns="http://www.openarchives.org/OAI/2.0/oai-identifier" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai-identifier http://www.openarchives.org/OAI/2.0/oai-identifier.xsd"> <scheme>oai</scheme> <repositoryIdentifier>ethnologue.com</repositoryIdentifier> <delimiter>:</delimiter> <sampleIdentifier>oai:ethnologue.com:AAA</sampleIdentifier> </oai-identifier> </description>
The basic Identify request supplies minimal information about an archive, namely, its name, base URL, and administrator email. An OLAC metadata repository must augment the Identify response by including an <olac-archive> element within a <description> container. This element gives additional information that makes it possible for an OLAC service provider to supply its users with a basic description of a participating archive.
The schema for validating an OLAC archive description is found at:
The target namespace is: http://www.language-archives.org/OLAC/1.0/olac-archive
The <olac-archive> element has an obligatory attribute, type, which must have one of two values:
type="institutional" indicates that the repository is sponsored and operated by an institution
type="personal" indicates that the repository is sponsored and operated by an individual (or a group of individuals)
These are the elements within an OLAC archive description:
- archiveURL
Optional. The home page of the archive on the Web. This is the home page for human visitors, not the base URL for harvesting.
- curator
Obligatory. The name of the person who curates the archive collection. If more than one person has collaborated as personal sponsors of the archive, then this element should contain all the names in the order and format the collaborators want to be cited.
- curatorTitle
Optional. The job title of the curator within the sponsoring institution (for an institutional archive) or within the institution of affiliation (for a personal archive).
- curatorEmail
Optional. A mailto: URI giving the email address for contacting the curator of the archive. (Note that this is distinct from the <adminEmail> in the Identify response which is the contact address for the maintainer of the metadata repository.)
- institution
Obligatory. The name of the sponsoring institution (for an institutional archive) or the institution of affiliation (for a personal archive). If the curator of a personal archive has no affiliation, then a value of Unaffiliated should be given.
- institutionURL
Optional. A URL for the home page of the institution.
- shortLocation
Obligatory. A brief statement (not to exceed 50 characters) of the location of the institution or the person providing the metadata following the format "City, Country". Multiple locations may be connected with "and". This information is shown in the location column of the table of participating archives at http://www.language-archives.org/archives.php4.
- location
Optional. A single paragraph (of arbitrary length) describing where an archive that houses a collection of physical holdings is located (for instance, include building name, room number, street address). Other information relevant to visiting the collection, such as opening hours or restrictions on access, may also be described. If the archive is purely an on-line repository, do not use this element.
- synopsis
Obligatory. A single paragraph (of arbitrary length) summarizing the purpose, scope, coverage, and so on of the archive.
- access
Obligatory. A single paragraph (of arbitrary length) summarizing terms of access to the materials described in the metadata repository. The statement can describe restrictions on access, licensing requirements, costs, and so on. Individual metadata records should use the Rights element to document such things for particular archive holdings. The purpose of <access> is to broadly characterize the entire archive.
For example,
<description> <olac-archive xmlns="http://www.language-archives.org/OLAC/1.0/olac-archive" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.language-archives.org/OLAC/1.0/olac-archive http://www.language-archives.org/OLAC/1.0/olac-archive.xsd" type="institutional"> <archiveURL>http://www.ethnologue.com</archiveURL> <curator>Raymond G. Gordon, Jr.</curator> <curatorTitle>Ethnologue Editor</curatorTitle> <curatorEmail>mailto:editor_ethnologue@sil.org</curatorEmail> <institution>SIL International</institution> <institutionURL>http://www.sil.org</institutionURL> <shortLocation>Dallas, USA</shortLocation> <location>7500 W. Camp Wisdom Rd., Dallas, TX 75236, U.S.A.</location> <synopsis>The Ethnologue repository gives a metadata record for every language entry in the Web edition of the Ethnologue. The latter provides basic information about each of the 7,000+ modern language of the world (both living and recently extinct).</synopsis> <access>Every resource described by the Ethnologue metadata repository is a public Web page that may be accessed without restriction. Reuse of material on the site is subject to the Terms of Use that are posted on the site.</access> </olac-archive> </description>
A static repository is an XML document that describes the resources made available by a particular institution or individual. It is a convenient way to create a metadata repository for a relatively small collection (say, up to a couple thousand records). Such a document may be created and maintained manually by means of an XML editor. Alternatively, it might be generated periodically by a script that extracts information from an existing database.
The OAI specification for a static repository is given in [OAI-SR]. The schema for validating a static repository is found at:
In addition to being valid with respect to this schema, an OLAC static repository must also:
Include an <oai-identifier> description and an <olac-archive> description in its <Identify> element.
Contain the following element within its <ListMetadataFormats> element:
<oai:metadataFormat> <oai:metadataPrefix>olac</oai:metadataPrefix> <oai:schema>http://www.language-archives.org/OLAC/1.0/olac.xsd</oai:schema> <oai:metadataNamespace>http://www.language-archives.org/OLAC/1.0/</oai:metadataNamespace> </oai:metadataFormat>
Contain a <ListRecords> element that specifies an attribute and value of metadataPrefix="olac" that contains at least one record, and in which every embedded record has a metadata description that conforms to the OLAC metadata standard [OLAC-Metadata].
A service for validating a repository for conformance to these requirements is found at:
An example of a complete OLAC static repository that conforms to these requirements is found at:
http://www.language-archives.org/OLAC/1.0/static-repository.xml
A dynamic repository is harder to implement since it requires the implementation of a CGI interface for the complete OAI protocol for metadata harvesting [OAI-PMH]. This is necessary, however, when the collection is large and needs to implement flow control to keep protocol responses to a reasonable size. The OAI community considers half a megabyte to be a reasonable response size. If the ListRecords response for all records in a repository would substantially exceed that size, then it may be necessary to implement a dynamic repository with flow control.
The implementation of a dynamic OLAC metadata repository has all the features of a minimal OAI repository implementation (as defined in [OAI-GRI]), except that a dynamic OLAC repository need not support the oai_dc metadata format. This is because the OLAC Aggregator [OLACA] provides that service for repositories that comply with this standard; see [OLAC-Display] for the specification of the olac to oai_dc crosswalk that is implemented by the Aggregator. In fact, unless the institution has reasons of its own to function independently as an OAI data provider, OLAC recommends that a dynamic repository not implement the oai_dc metadata format so that the translation of OLAC metadata to the oai_dc format will be done consistently across the community.
In addition to the requirements of a minimal OAI repository implementation, a dynamic OLAC metadata repository must comply with the following additional requirements.
The Identify response must include an <oai-identifier> description and an <olac-archive> description.
The ListMetadataFormats response (when made with no additional parameters) must contain a specification for the olac metadata prefix that declares the schema and namespace for the version of OLAC metadata that is being used. For example,
<oai:metadataFormat> <oai:metadataPrefix>olac</oai:metadataPrefix> <oai:schema>http://www.language-archives.org/OLAC/1.0/olac.xsd</oai:schema> <oai:metadataNamespace>http://www.language-archives.org/OLAC/1.0/</oai:metadataNamespace> </oai:metadataFormat>
When the metadataPrefix argument to ListIdentifiers is specified as olac, the request must respond with at least one record.
When the metadataPrefix argument to GetRecord is specified as olac, the <oai:metadata> element of the response must either be empty (when no OLAC metadata is available for the given identifier) or it must contain an <olac:olac> element that conforms to some version of the XML schema for OLAC metadata [OLAC-Metadata]. That element must contain an xmlns attribute specifying the URI that identifies the namespace for the version of the OLAC metadata schema that is being used.
When the metadataPrefix argument to ListRecords is specified as olac, every <oai:metadata> element in the response must contain an <olac:olac> element that conforms to some version of the XML schema for OLAC metadata [OLAC-Metadata]. Each such element must contain an xmlns attribute specifying the URI that identifies the namespace for the version of the metadata schema that is being used.
When a request is made to register a metadata repository with OLAC, it is first tested for conformance to the requirements listed in the sections above. When these are met, the registration request is reviewed by the OLAC Council (see [OLAC-Process]) before final acceptance. The role of the Council in the registration process is to ensure that all registered archives meet the following guidelines concerning relevance and granularity.
Regarding relevance, in order to be eligible for registration as an OLAC archive:
The metadata repository must catalog language resources.
Regarding the granularity of repositories, a repository is meant to catalog all the holdings of an archive, rather than having separate repositories for each of the collections within an archive. Thus,
A given institution or individual should typically publish the metadata for all its resources in a single repository.
An exception is appropriate when distinct collections are managed in separate databases and thus require distinct software for implementing separate repositories.
Regarding the granularity of the records in a repository, the basic guideline is this:
A metadata repository should not degrade the "signal-to-noise ratio" for language resource discovery.
For instance, if a repository lists separate metadata records for all the computer files that comprise the documentation of a single linguistic event, then the effectiveness of searching will be degraded by all the duplicate records for the same documented event. Rather, the individual files should be listed as related components in a single metadata record. Similarly, if a repository lists separate metadata records for each of the 500 texts that make up a single corpus for a given language, then users searching for resources about that language will be swamped by 500 records for the same resource that will obscure the records for other resources that might be available. Rather, there should be one metadata record for the corpus as a whole, containing a link to the index on the host site that will allow interested users to explore the 500 texts.