![]() |
OLAC Record oai:lindat.mff.cuni.cz:11234/1-5813 |
Metadata | ||
Title: | Prague Dependency Treebank - Consolidated 2.0 (PDT-C 2.0) | |
Bibliographic Citation: | http://hdl.handle.net/11234/1-5813 | |
Creator: | Hajič, Jan | |
Bejček, Eduard | ||
Bémová, Alevtina | ||
Buráňová, Eva | ||
Fučíková, Eva | ||
Hajičová, Eva | ||
Havelka, Jiří | ||
Hlaváčová, Jaroslava | ||
Homola, Petr | ||
Ircing, Pavel | ||
Kárník, Jiří | ||
Kettnerová, Václava | ||
Klyueva, Natalia | ||
Kolářová, Veronika | ||
Kučová, Lucie | ||
Lopatková, Markéta | ||
Mareček, David | ||
Mikulová, Marie | ||
Mírovský, Jiří | ||
Nedoluzhko, Anna | ||
Novák, Michal | ||
Pajas, Petr | ||
Panevová, Jarmila | ||
Peterek, Nino | ||
Poláková, Lucie | ||
Popel, Martin | ||
Popelka, Jan | ||
Romportl, Jan | ||
Rysová, Magdaléna | ||
Semecký, Jiří | ||
Sgall, Petr | ||
Spoustová, Johanka | ||
Straka, Milan | ||
Straňák, Pavel | ||
Synková, Pavlína | ||
Ševčíková, Magda | ||
Šindlerová, Jana | ||
Štěpánek, Jan | ||
Štěpánková, Barbora | ||
Toman, Josef | ||
Urešová, Zdeňka | ||
Vidová Hladká, Barbora | ||
Zeman, Daniel | ||
Zikánová, Šárka | ||
Žabokrtský, Zdeněk | ||
Date (W3CDTF): | 2025-01-09T16:55:41Z | |
Date Available: | 2025-01-09T16:55:41Z | |
Description: | A manually annotated and genre-diversified language resource with rich linguistic information from morphology and syntax to semantics, the Prague Dependency Treebank – Consolidated 2.0 (PDT-C 2.0) is a consolidated release of the existing PDT-corpora of Czech data, uniformly annotated using the standard PDT scheme. PDT-corpora included in PDT-C: Prague Dependency Treebank (written newspaper and journal texts from three genres); Czech part of Prague Czech-English Dependency Treebank (translated financial texts, from English), Prague Dependency Treebank of Spoken Czech (spoken data, including audio and transcripts and multiple speech reconstruction annotation); PDT-Faust (user-generated texts). The separately published original treebanks are published in one package, to allow easier data handling for all the datasets and they are enhanced with further manual linguistic annotation. In the previous PDT-C 1.0 version, the data was enhanced with a manual linguistic annotation at the morphological layer. For the PDT-C 2.0 version, manual annotation at the analytical layer is performed in those parts of the corpus that were previously annotated only by automatic tools. The goal of the annotation work is also to consolidate the manual annotation across all layers. This resulted in many modifications and corrections to the original annotation. Manual annotation of discourse relations is also now provided for all PDT-C 2.0 data. In the PDT-C 2.0 release, there is now a manual annotation at the all annotation layers (morphological, surface syntactic (analytical), deep syntactic layer (tectogrammatical)) in all four datasets. Additional semantic features in the PDT dataset are also manually annotated. New version of morphological dictionary is enclosed; a common valency lexicon for all four original parts is enclosed. Documentation provides two browsing and editing desktop tools (TrEd and MEd) and the corpus is also available online for searching using PML-TQ. | |
Identifier (URI): | http://hdl.handle.net/11234/1-5813 | |
Language: | Czech | |
Language (ISO639): | ces | |
Publisher: | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) | |
Replaces (URI): | http://hdl.handle.net/11234/1-3185 | |
Rights: | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) | |
http://creativecommons.org/licenses/by-nc-sa/4.0/ | ||
Subject: | treebank | |
dependency | ||
tectogrammatics | ||
topic-focus articulation | ||
multiword expressions | ||
coreference | ||
bridging relations | ||
discourse | ||
morphology | ||
syntax | ||
tokenization | ||
lemmatization | ||
semantic relations | ||
lexical semantics | ||
lexicon | ||
valency | ||
speech reconstruction | ||
clauses | ||
speech recognition | ||
spoken corpus | ||
Type: | corpus | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:lindat.mff.cuni.cz:11234/1-5813 | |
DateStamp: | 2025-01-09 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Hajič, Jan; Bejček, Eduard; Bémová, Alevtina; Buráňová, Eva; Fučíková, Eva; Hajičová, Eva; Havelka, Jiří; Hlaváčová, Jaroslava; Homola, Petr; Ircing, Pavel; Kárník, Jiří; Kettnerová, Václava; Klyueva, Natalia; Kolářová, Veronika; Kučová, Lucie; Lopatková, Markéta; Mareček, David; Mikulová, Marie; Mírovský, Jiří; Nedoluzhko, Anna; Novák, Michal; Pajas, Petr; Panevová, Jarmila; Peterek, Nino; Poláková, Lucie; Popel, Martin; Popelka, Jan; Romportl, Jan; Rysová, Magdaléna; Semecký, Jiří; Sgall, Petr; Spoustová, Johanka; Straka, Milan; Straňák, Pavel; Synková, Pavlína; Ševčíková, Magda; Šindlerová, Jana; Štěpánek, Jan; Štěpánková, Barbora; Toman, Josef; Urešová, Zdeňka; Vidová Hladká, Barbora; Zeman, Daniel; Zikánová, Šárka; Žabokrtský, Zdeněk. 2025. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL). | |
Terms: | area_Europe country_CZ dcmi_Text iso639_ces olac_primary_text |