OLAC Record oai:lindat.mff.cuni.cz:11372/LRT-1646 |
Metadata | ||
Title: | WMT16 Quality Estimation Shared Task Training and Development Data | |
Bibliographic Citation: | http://hdl.handle.net/11372/LRT-1646 | |
Creator: | Specia, Lucia | |
Logacheva, Varvara | ||
Scarton, Carolina | ||
Date (W3CDTF): | 2016-02-29T13:37:23Z | |
Date Available: | 2016-02-29T13:37:23Z | |
Description: | Training and development data for the WMT16 QE task. Test data will be published as a separate item. This shared task will build on its previous four editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, sentence-level and document-level estimation. The sentence and word-level tasks will explore a large dataset produced from post-editions by professional translators (as opposed to crowdsourced translations as in the previous year). For the first time, the data will be domain-specific (IT domain). The document-level task will use, for the first time, entire documents, which have been human annotated for quality indirectly in two ways: through reading comprehension tests and through a two-stage post-editing exercise. Our tasks have the following goals: - To advance work on sentence and word-level quality estimation by providing domain-specific, larger and professionally annotated datasets. - To study the utility of detailed information logged during post-editing (time, keystrokes, actual edits) for different levels of prediction. - To analyse the effectiveness of different types of quality labels provided by humans for longer texts in document-level prediction. This year's shared task provides new training and test datasets for all tasks, and allows participants to explore any additional data and resources deemed relevant. A in-house MT system was used to produce translations for the sentence and word-level tasks, and multiple MT systems were used to produce translations for the document-level task. Therefore, MT system-dependent information will be made available where possible. | |
Identifier (URI): | http://hdl.handle.net/11372/LRT-1646 | |
Is Replaced By (URI): | http://hdl.handle.net/11372/LRT-1974 | |
Language: | English | |
German | ||
Language (ISO639): | eng | |
deu | ||
Publisher: | University of Sheffield | |
Replaces (URI): | http://hdl.handle.net/11372/LRT-1631 | |
Rights: | AGREEMENT ON THE USE OF DATA IN QT21 | |
https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21 | ||
Subject: | machine translation | |
quality estimation | ||
machine learning | ||
Type: | corpus | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:lindat.mff.cuni.cz:11372/LRT-1646 | |
DateStamp: | 2021-06-29 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Specia, Lucia; Logacheva, Varvara; Scarton, Carolina. 2016. University of Sheffield. | |
Terms: | area_Europe country_DE country_GB dcmi_Text iso639_deu iso639_eng olac_primary_text |