OLAC Record
oai:www.ldc.upenn.edu:LDC2017T09

Metadata
Title:The EventStatus Corpus
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Huang, Ruihong, Daniel Jurafsky, and Ellen Riloff. The EventStatus Corpus LDC2017T09. Web Download. Philadelphia: Linguistic Data Consortium, 2017
Contributor:Huang, Ruihong
Jurafsky, Daniel
Riloff, Ellen
Date (W3CDTF):2017
Date Issued (W3CDTF):2017-05-15
Description:*Introduction* The EventStatus Corpus was developed by researchers at Texas A&M University, Stanford University and The University of Utah. It consists of approximately 3,000 English and 1,500 Spanish news articles about civil unrest events annotated with temporal tags. This corpus was designed to support the study of the temporal and aspectual properties of major events, that is, whether an event has already happened, is currently happening or may happen in the future. Since it focuses on a single domain (civil unrest events), it may be appropriate for tasks such as event extraction and temporal question answering. *Data* The relevant news articles were sourced from English Gigaword Fifth Edition (LDC2011T07) and Spanish Gigaword Third Edition (LDC2011T12). The civil unrest events include protests, demonstrations, marches and strikes. The data was annotated as PAST, ON-GOING or FUTURE and within each of those categories, as PLANNED, ALERT or POSSIBLE. In addition to the annotated articles, file lists used in experiments for tuning and test are included. 10-fold cross-validations were performed, and the specific 10-fold splits of the test are included as well. All text is presented as plain text and encoded in UTF-8. *Samples* Please view this sample. *Updates* None at this time.
Extent:Corpus size: 39080 KB
Identifier:LDC2017T09
https://catalog.ldc.upenn.edu/LDC2017T09
ISBN: 1-58563-800-5
ISLRN: 173-931-115-382-5
DOI: 10.35111/n0mc-6m82
Language:English
Spanish
Language (ISO639):eng
spa
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2017T09
Rights Holder:Portions © 1994-2010 Agence France Presse, © 1993-2010 The Associated Press, © 1997-2010 Central News Agency (Taiwan), © 1994-1998, 2003-2009 Los Angeles Times-Washington Post News Service, Inc., © 1994-2010 New York Times, © 2010 The Washington Post News Service with Bloomberg News, © 1995-2010 Xinhua News Agency, © 2017 Ruihong Huang, © 2003, 2005, 2006, 2007, 2009, 2011, 2013, 2017 Trustees of the University of Pennsylvania
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2017T09
DateStamp:  2020-11-30
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Huang, Ruihong; Jurafsky, Daniel; Riloff, Ellen. 2017. Linguistic Data Consortium.
Terms: area_Europe country_ES country_GB dcmi_Text iso639_eng iso639_spa olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2017T09
Up-to-date as of: Fri Dec 6 7:48:38 EST 2024