OLAC Record oai:www.ldc.upenn.edu:LDC2012T08 |
Metadata | ||
Title: | Prague Czech-English Dependency Treebank 2.0 | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Hajič, Jan , et al. Prague Czech-English Dependency Treebank 2.0 LDC2012T08. Web Download. Philadelphia: Linguistic Data Consortium, 2012 | |
Contributor: | Hajič, Jan | |
Hajičová, Eva | ||
Panevová, Jarmila | ||
Sgall, Petr | ||
Cinková, Silvie | ||
Fučíková, Eva | ||
Mikulová, Marie | ||
Pajas, Petr | ||
Popelka, Jan | ||
Semecký, Jiří | ||
Šindlerová, Jana | ||
Štěpánek, Jan | ||
Toman, Josef | ||
Urešová, Zdeňka | ||
Žabokrtský, Zdeněk | ||
Date (W3CDTF): | 2012 | |
Date Issued (W3CDTF): | 2012-06-15 | |
Description: | *Introduction* Prague Czech-English Dependency Treebank (PCEDT) 2.0 was developed by the Institute of Formal and Applied Linguistics at Charles University in Prague, Czech Republic. It is a corpus of Czech-English parallel resources translated, aligned and manually annotated for dependency structure, semantic labeling, argument structure, ellipsis and anaphora resolution. This release updates Prague Czech-English Dependency Treebank 1.0 (LDC2004T25) by adding English newswire texts so that it now contains over two million words in close to 100,000 sentences. *Data* The principal new material in PCEDT 2.0 is the inclusion of the entire Wall Street Journal data from Treebank-3 (LDC99T42). Not included from PCEDT 1.0 are the Readers Digest material, the Czech monolingual corpus, and the English-Czech dictionary. Each section is enhanced with a comprehensive manual linguistic annotation in the Prague Dependency Treebank style (LDC2006T01, Prague Dependency Treebank 2.0). The main features of this annotation style are: * dependency structure of the content words and coordinating and similar structures (function words are attached as their attribute values) * semantic labeling of content words and types of coordinating structures * argument structure, including an argument structure (valency) lexicon for both languages * ellipsis and anaphora resolution This annotation style is called tectogrammatical annotation, and it constitutes the tectogrammatical layer in the corpus. Please consult the PCEDT website for more information and documentation. *Samples* Please follow this link for a sample of the data included. *Updates* None at this time. | |
Extent: | Corpus size: 4446421 KB | |
Identifier: | LDC2012T08 | |
https://catalog.ldc.upenn.edu/LDC2012T08 | ||
ISBN: 1-58563-616-9 | ||
ISLRN: 443-974-834-414-7 | ||
DOI: 10.35111/mv82-j246 | ||
Language: | English | |
Czech | ||
Language (ISO639): | eng | |
ces | ||
License: | LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Rights Holder: | Portions © 1987-1989 Dow Jones & Company, Inc., © 2002-2012 Charles University in Prague, Institute of Formal and Applied Linguistics, © 1999, 2004, 2012 Trustees of the University of Pennsylvania | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2012T08 | |
DateStamp: | 2020-11-30 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Hajič, Jan; Hajičová, Eva; Panevová, Jarmila; Sgall, Petr; Cinková, Silvie; Fučíková, Eva; Mikulová, Marie; Pajas, Petr; Popelka, Jan; Semecký, Jiří; Šindlerová, Jana; Štěpánek, Jan; Toman, Josef; Urešová, Zdeňka; Žabokrtský, Zdeněk. 2012. Linguistic Data Consortium. | |
Terms: | area_Europe country_CZ country_GB dcmi_Text iso639_ces iso639_eng olac_primary_text |