OLAC Record: 2006 CoNLL Shared Task

OLAC Record
oai:www.ldc.upenn.edu:LDC2015T12

Metadata

Title: 2006 CoNLL Shared Task - Arabic & Czech

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Charles University. 2006 CoNLL Shared Task - Arabic & Czech LDC2015T12. Web Download. Philadelphia: Linguistic Data Consortium, 2015

Contributor: Charles University

Date (W3CDTF): 2015

Date Issued (W3CDTF): 2015-06-15

Description: *Introduction* 2006 CoNLL Shared Task - Arabic & Czech consists of Arabic and Czech dependency treebanks used as part of the CoNLL 2006 shared task on multi-lingual dependency parsing. LDC also released the following 2006 & 2007 CoNLL Shared Task corpora: * 2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish (LDC2018T06) * 2007 CoNLL Shared Task - Greek, Hungarian & Italian (LDC2018T07) * 2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish (LDC2018T06) * 2006 CoNLL Shared Task - Ten Languages (LDC2015T11) This corpus is cross listed with ELRA as ELRA-W0087. The Conference on Computational Natural Language Learning (CoNLL) is accompanied every year by a shared task intended to promote natural language processing applications and evaluate them in a standard setting. In 2006, the shared task was devoted to the parsing of syntactic dependencies using corpora from up to thirteen languages. The task aimed to define and extend the then-current state of the art in dependency parsing, a technology that complemented previous tasks by producing a different kind of syntactic description of input text. More information about the 2006 shared task is available on the CoNLL-X web page. LDC has released data sets from other CoNLL shared tasks. 2008 CoNLL Shared Task Data contains the English material used in the 2008 shared task which focused on English, employed a unified dependency-based formalism and merged the tasks of syntactic dependency parsing, identifying semantic arguments and labeling them with semantic roles. 2009 CoNLL Shared Task Data Parts 1 and 2 consists of the English, Catalan, Chinese, Czech, German and Spanish resources used in the 2009 task which included a comparison of time and space complexity based on participants' input and learning curve comparison for languages with large datasets. LDC has also released the following CoNLL Shared Task data sets: * 2006 CoNLL Shared Task - Ten Languages (LDC2015T11) * 2008 CoNLL Shared Task Data (LDC2009T12) * 2009 CoNLL Shared Task Part 1 (LDC2012T03) * 2009 CoNLL Shared Task Part 2 (LDC2012T04) * 2015-2016 CoNLL Shared Task (LDC2017T13) *Data* The source data in this release consists principally of news and journal texts. The individual data sets are subsets of the following: * Prague Arabic Dependency Treebank (PADT) 1.0 * The Prague Dependency Treebank 1.0 *Samples* Please view these Czech and Arabic samples. *Updates* None at this time.

Extent: Corpus size: 63312 KB

Identifier: LDC2015T12

https://catalog.ldc.upenn.edu/LDC2015T12

ISBN: 1-58563-718-1

ISLRN: 798-485-294-792-1

DOI: 10.35111/9ez4-wy88

Language: Czech

Standard Arabic

Language (ISO639): ces

arb

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2015T12

Rights Holder: Portions © 2000 Agence France Presse, © 2001 Al Hayat, © 2002 An Nahar, © 1994 Ceskomoravský Profit business weekly, © 1991, 1994, 1995 Lidové noviny daily newspapers, © 1992 Mladá fronta Dnes daily newspapers, © 1993-1996 Readers Digest, © 2002 Ummah Press Service, © 1992-1993 Vesmír scientific magazine, Academia Publishers, © 2003 Xinhua News Agency, © 1996-2001, 2002-2004 Center for Computational Linguistics & Institute for Formal and Applied Linguistics & Institute of Comparative Linguistics, Charles University in Prague, © 2000-2004, 2015 Trustees of the University of Pennsylvania

Type (DCMI): Text

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2015T12

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Charles University. 2015. Linguistic Data Consortium.
Terms: area_Asia area_Europe country_CZ country_SA dcmi_Text iso639_arb iso639_ces olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2015T12
Up-to-date as of: Wed Oct 29 7:01:32 EDT 2025

Metadata
Title:		2006 CoNLL Shared Task - Arabic & Czech
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Charles University. 2006 CoNLL Shared Task - Arabic & Czech LDC2015T12. Web Download. Philadelphia: Linguistic Data Consortium, 2015
Contributor:		Charles University
Date (W3CDTF):		2015
Date Issued (W3CDTF):		2015-06-15
Description:		Introduction 2006 CoNLL Shared Task - Arabic & Czech consists of Arabic and Czech dependency treebanks used as part of the CoNLL 2006 shared task on multi-lingual dependency parsing. LDC also released the following 2006 & 2007 CoNLL Shared Task corpora: * 2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish (LDC2018T06) * 2007 CoNLL Shared Task - Greek, Hungarian & Italian (LDC2018T07) * 2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish (LDC2018T06) * 2006 CoNLL Shared Task - Ten Languages (LDC2015T11) This corpus is cross listed with ELRA as ELRA-W0087. The Conference on Computational Natural Language Learning (CoNLL) is accompanied every year by a shared task intended to promote natural language processing applications and evaluate them in a standard setting. In 2006, the shared task was devoted to the parsing of syntactic dependencies using corpora from up to thirteen languages. The task aimed to define and extend the then-current state of the art in dependency parsing, a technology that complemented previous tasks by producing a different kind of syntactic description of input text. More information about the 2006 shared task is available on the CoNLL-X web page. LDC has released data sets from other CoNLL shared tasks. 2008 CoNLL Shared Task Data contains the English material used in the 2008 shared task which focused on English, employed a unified dependency-based formalism and merged the tasks of syntactic dependency parsing, identifying semantic arguments and labeling them with semantic roles. 2009 CoNLL Shared Task Data Parts 1 and 2 consists of the English, Catalan, Chinese, Czech, German and Spanish resources used in the 2009 task which included a comparison of time and space complexity based on participants' input and learning curve comparison for languages with large datasets. LDC has also released the following CoNLL Shared Task data sets: * 2006 CoNLL Shared Task - Ten Languages (LDC2015T11) * 2008 CoNLL Shared Task Data (LDC2009T12) * 2009 CoNLL Shared Task Part 1 (LDC2012T03) * 2009 CoNLL Shared Task Part 2 (LDC2012T04) * 2015-2016 CoNLL Shared Task (LDC2017T13) Data The source data in this release consists principally of news and journal texts. The individual data sets are subsets of the following: * Prague Arabic Dependency Treebank (PADT) 1.0 * The Prague Dependency Treebank 1.0 Samples Please view these Czech and Arabic samples. Updates None at this time.
Extent:		Corpus size: 63312 KB
Identifier:		LDC2015T12
		https://catalog.ldc.upenn.edu/LDC2015T12
		ISBN: 1-58563-718-1
		ISLRN: 798-485-294-792-1
		DOI: 10.35111/9ez4-wy88
Language:		Czech
Language:		Standard Arabic
Language (ISO639):		ces
Language (ISO639):		arb
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2015T12
Rights Holder:		Portions © 2000 Agence France Presse, © 2001 Al Hayat, © 2002 An Nahar, © 1994 Ceskomoravský Profit business weekly, © 1991, 1994, 1995 Lidové noviny daily newspapers, © 1992 Mladá fronta Dnes daily newspapers, © 1993-1996 Readers Digest, © 2002 Ummah Press Service, © 1992-1993 Vesmír scientific magazine, Academia Publishers, © 2003 Xinhua News Agency, © 1996-2001, 2002-2004 Center for Computational Linguistics & Institute for Formal and Applied Linguistics & Institute of Comparative Linguistics, Charles University in Prague, © 2000-2004, 2015 Trustees of the University of Pennsylvania
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2015T12
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Charles University. 2015. Linguistic Data Consortium.
Terms:		area_Asia area_Europe country_CZ country_SA dcmi_Text iso639_arb iso639_ces olac_primary_text