OLAC Record
oai:www.ldc.upenn.edu:LDC2010T09

Metadata
Title:ACE 2005 Mandarin SpatialML Annotations
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Wang, Xiaoman, et al. ACE 2005 Mandarin SpatialML Annotations LDC2010T09. Web Download. Philadelphia: Linguistic Data Consortium, 2010
Contributor:Wang, Xiaoman
Doran, Christine
Hitzeman, Janet
Mani, Inderjeet
Date (W3CDTF):2010
Date Issued (W3CDTF):2010-05-14
Description:*Introduction* ACE 2005 Mandarin SpatialML Annotations was developed by researchers at The MITRE Corporation (MITRE). ACE 2005 Mandarin SpatialML Annotations applies SpatialML tags to a subset of the source Mandarin training data in ACE 2005 Multilingual Training Corpus (LDC2006T06). Annotations for entities, relations, and events, which were included in ACE 2005 Multilingual Training Corpus, are not included in the current SpatialML release. For SpatialML markup to ACE 2005 English data, see ACE 2005 English SpatialML Annotations (LDC2008T03). SpatialML is a mark-up language for representing spatial expressions in natural language documents. SpatialML focuses is on geography and culturally-relevant landmarks, rather than biology, cosmology, geology, or other regions of the spatial language domain. The goal is to allow for better integration of text collections with resources such as databases that provide spatial information about a domain, including gazetteers, physical feature databases and mapping services. The ACE (Automatic Content Extraction) Program seeks to develop extraction technology to support automatic processing of source language data (in the form of natural text, and as text derived from automatic speech recognition and optical character recognition). This includes classification, filtering, and selection based on the language content of the source data, i.e., based on the meaning conveyed by the data. Thus the ACE program requires the development of technologies that automatically detect and characterize this meaning. The annotation efforts of the ACE program supports the development of automatic content extraction technology to support automatic processing of human language in text form. The kind of information recognized and extracted from text includes entities, values, temporal expressions, relations and events The SpatialML annotation scheme is intended to emulate earlier progress on time expressions such as TIMEX2, TimeML, and the 2005 ACE guidelines. The main SpatialML tag is the PLACE tag which encodes information about location. The central goal of SpatialML is to map location information in text to data from gazetteers and other databases to the extent possible by defining attributes in the PLACE tag. Therefore, semantic attributes such as country abbreviations, country subdivision and dependent area abbreviations (e.g., US states), and geo-coordinates are used to help establish such a mapping. LINK and PATH tags express relations between places, such as inclusion relations and trajectories of various kinds. Information in the tag along with the tagged location string should be sufficient to uniquely determine the mapping, when such a mapping is possible. This also means that redundant information is not included in the tag. To the extent possible, SpatialML leverages ISO and other standards towards the goal of making the scheme compatible with existing and future corpora. The SpatialML guidelines are compatible with existing guidelines for spatial annotation and existing corpora within the ACE research program. *Data* This corpus consists of a 298-document subset of broadcast material from the ACE 2005 Multilingual Training Corpus (LDC2006T06) that has been tagged by a native Mandarin speaker according to version 2.3 of the SpatialML annotation guidelines, which are included in the documentation for this release. * * *Updates* No updates have been issued at this time.
Extent:Corpus size: 2764 KB
Identifier:LDC2010T09
https://catalog.ldc.upenn.edu/LDC2010T09
ISBN: 1-58563-546-4
ISLRN: 951-452-048-245-8
DOI: 10.35111/pkce-3b81
Language:Mandarin Chinese
Language (ISO639):cmn
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2010T09
Rights Holder:Portions © 2000-2001 China Broadcasting System, © 2000-2001 China Central TV, © 2000-2001 China National Radio, © 2000-2001 China Television System, © 2008-2009 The MITRE Corporation, © 2005, 2006, 2010 Trustees of the University of Pennsylvania
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2010T09
DateStamp:  2020-11-30
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Wang, Xiaoman; Doran, Christine; Hitzeman, Janet; Mani, Inderjeet. 2010. Linguistic Data Consortium.
Terms: area_Asia country_CN dcmi_Text iso639_cmn olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2010T09
Up-to-date as of: Fri Dec 6 7:47:54 EST 2024