OLAC Record
oai:lindat.mff.cuni.cz:11372/LRT-1646

Metadata
Title:WMT16 Quality Estimation Shared Task Training and Development Data
Bibliographic Citation:http://hdl.handle.net/11372/LRT-1646
Creator:Specia, Lucia
Logacheva, Varvara
Scarton, Carolina
Date (W3CDTF):2016-02-29T13:37:23Z
Date Available:2016-02-29T13:37:23Z
Description:Training and development data for the WMT16 QE task. Test data will be published as a separate item. This shared task will build on its previous four editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, sentence-level and document-level estimation. The sentence and word-level tasks will explore a large dataset produced from post-editions by professional translators (as opposed to crowdsourced translations as in the previous year). For the first time, the data will be domain-specific (IT domain). The document-level task will use, for the first time, entire documents, which have been human annotated for quality indirectly in two ways: through reading comprehension tests and through a two-stage post-editing exercise. Our tasks have the following goals: - To advance work on sentence and word-level quality estimation by providing domain-specific, larger and professionally annotated datasets. - To study the utility of detailed information logged during post-editing (time, keystrokes, actual edits) for different levels of prediction. - To analyse the effectiveness of different types of quality labels provided by humans for longer texts in document-level prediction. This year's shared task provides new training and test datasets for all tasks, and allows participants to explore any additional data and resources deemed relevant. A in-house MT system was used to produce translations for the sentence and word-level tasks, and multiple MT systems were used to produce translations for the document-level task. Therefore, MT system-dependent information will be made available where possible.
Identifier (URI):http://hdl.handle.net/11372/LRT-1646
Is Replaced By (URI):http://hdl.handle.net/11372/LRT-1974
Language:English
German
Language (ISO639):eng
deu
Publisher:University of Sheffield
Replaces (URI):http://hdl.handle.net/11372/LRT-1631
Rights:AGREEMENT ON THE USE OF DATA IN QT21
https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21
Subject:machine translation
quality estimation
machine learning
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11372/LRT-1646
DateStamp:  2021-06-29
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Specia, Lucia; Logacheva, Varvara; Scarton, Carolina. 2016. University of Sheffield.
Terms: area_Europe country_DE country_GB dcmi_Text iso639_deu iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11372/LRT-1646
Up-to-date as of: Thu Oct 5 0:40:27 EDT 2023