OLAC Record oai:lindat.mff.cuni.cz:11234/1-5053 |
Metadata | ||
Title: | Coreference in Universal Dependencies 1.1 (CorefUD 1.1) | |
Bibliographic Citation: | http://hdl.handle.net/11234/1-5053 | |
Creator: | Novák, Michal | |
Popel, Martin | ||
Žabokrtský, Zdeněk | ||
Zeman, Daniel | ||
Nedoluzhko, Anna | ||
Acar, Kutay | ||
Bourgonje, Peter | ||
Cinková, Silvie | ||
Cebiroğlu Eryiğit, Gülşen | ||
Hajič, Jan | ||
Hardmeier, Christian | ||
Haug, Dag | ||
Jørgensen, Tollef | ||
Kåsen, Andre | ||
Krielke, Pauline | ||
Landragin, Frédéric | ||
Lapshinova-Koltunski, Ekaterina | ||
Mæhlum, Petter | ||
Martí, M. Antònia | ||
Mikulová, Marie | ||
Nøklestad, Anders | ||
Ogrodniczuk, Maciej | ||
Øvrelid, Lilja | ||
Pamay Arslan, Tuğba | ||
Recasens, Marta | ||
Solberg, Per Erik | ||
Stede, Manfred | ||
Straka, Milan | ||
Toldova, Svetlana | ||
Vadász, Noémi | ||
Velldal, Erik | ||
Vincze, Veronika | ||
Zeldes, Amir | ||
Žitkus, Voldemaras | ||
Date (W3CDTF): | 2023-02-25T16:12:43Z | |
Date Available: | 2023-02-25T16:12:43Z | |
Description: | CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version 1.1 consists of 21 datasets for 13 languages. The datasets are enriched with automatic morphological and syntactic annotations that are fully compliant with the standards of the Universal Dependencies project. All the datasets are stored in the CoNLL-U format, with coreference- and bridging-specific information captured by attribute-value pairs located in the MISC column. The collection is divided into a public edition and a non-public (ÚFAL-internal) edition. The publicly available edition is distributed via LINDAT-CLARIAH-CZ and contains 17 datasets for 12 languages (1 dataset for Catalan, 2 for Czech, 2 for English, 1 for French, 2 for German, 2 for Hungarian, 1 for Lithuanian, 2 for Norwegian, 1 for Polish, 1 for Russian, 1 for Spanish, and 1 for Turkish), excluding the test data. The non-public edition is available internally to ÚFAL members and contains additional 4 datasets for 2 languages (1 dataset for Dutch, and 3 for English), which we are not allowed to distribute due to their original license limitations. It also contains the test data portions for all datasets. When using any of the harmonized datasets, please get acquainted with its license (placed in the same directory as the data) and cite the original data resource too. Compared to the previous version 1.0, the version 1.1 comprises new languages and corpora, namely Hungarian-KorKor, Norwegian-BokmaalNARC, Norwegian-NynorskNARC, and Turkish-ITCC. In addition, the English GUM dataset has been updated to a newer and larger version, and the conversion pipelines for most datasets have been refined (a list of all changes in each dataset can be found in the corresponding README file). | |
Identifier (URI): | http://hdl.handle.net/11234/1-5053 | |
Language: | Catalan | |
Czech | ||
English | ||
French | ||
German | ||
Hungarian | ||
Lithuanian | ||
Norwegian | ||
Polish | ||
Russian | ||
Spanish | ||
Turkish | ||
Language (ISO639): | cat | |
ces | ||
eng | ||
fra | ||
deu | ||
hun | ||
lit | ||
nor | ||
pol | ||
rus | ||
spa | ||
tur | ||
Publisher: | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) | |
Replaces (URI): | http://hdl.handle.net/11234/1-4698 | |
Rights: | Licence CorefUD v1.1 | |
https://lindat.mff.cuni.cz/repository/xmlui/page/license-corefud-1.1 | ||
Subject: | dependency | |
treebank | ||
coreference | ||
bridging relations | ||
harmonized annotation | ||
Type: | corpus | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:lindat.mff.cuni.cz:11234/1-5053 | |
DateStamp: | 2023-02-25 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Novák, Michal; Popel, Martin; Žabokrtský, Zdeněk; Zeman, Daniel; Nedoluzhko, Anna; Acar, Kutay; Bourgonje, Peter; Cinková, Silvie; Cebiroğlu Eryiğit, Gülşen; Hajič, Jan; Hardmeier, Christian; Haug, Dag; Jørgensen, Tollef; Kåsen, Andre; Krielke, Pauline; Landragin, Frédéric; Lapshinova-Koltunski, Ekaterina; Mæhlum, Petter; Martí, M. Antònia; Mikulová, Marie; Nøklestad, Anders; Ogrodniczuk, Maciej; Øvrelid, Lilja; Pamay Arslan, Tuğba; Recasens, Marta; Solberg, Per Erik; Stede, Manfred; Straka, Milan; Toldova, Svetlana; Vadász, Noémi; Velldal, Erik; Vincze, Veronika; Zeldes, Amir; Žitkus, Voldemaras. 2023. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL). | |
Terms: | area_Asia area_Europe country_CZ country_DE country_ES country_FR country_GB country_HU country_LT country_NO country_PL country_RU country_TR dcmi_Text iso639_cat iso639_ces iso639_deu iso639_eng iso639_fra iso639_hun iso639_lit iso639_nor iso639_pol iso639_rus iso639_spa iso639_tur olac_primary_text |