OLAC Record oai:lindat.mff.cuni.cz:11372/LRT-1672 |
Metadata | ||
Title: | WMT16 Tuning Shared Task Models (English-to-Czech) | |
Bibliographic Citation: | http://hdl.handle.net/11372/LRT-1672 | |
Creator: | Kamran, Amir | |
Jawaid, Bushra | ||
Bojar, Ondřej | ||
Stanojevic, Milos | ||
Date (W3CDTF): | 2016-03-22T12:33:39Z | |
Date Available: | 2016-03-22T12:33:39Z | |
Description: | This item contains models to tune for the WMT16 Tuning shared task for English-to-Czech. CzEng 1.6pre (http://ufal.mff.cuni.cz/czeng/czeng16pre) corpus is used for the training of the translation models. The data is tokenized (using Moses tokenizer), lowercased and sentences longer than 60 words and shorter than 4 words are removed before training. Alignment is done using fast_align (https://github.com/clab/fast_align) and the standard Moses pipeline is used for training. Two 5-gram language models are trained using KenLM: one only using the CzEng Czech data and the other is trained using all available Czech mono data for WMT except Common Crawl. Also included are two lexicalized bidirectional reordering models, word based and hierarchical, with msd conditioned on both source and target of processed CzEng. | |
Identifier (URI): | http://hdl.handle.net/11372/LRT-1672 | |
Language: | English | |
Czech | ||
Language (ISO639): | eng | |
ces | ||
Publisher: | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) | |
University of Amsterdam, ILLC | ||
Rights: | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) | |
http://creativecommons.org/licenses/by-nc-sa/4.0/ | ||
Subject: | WMT16 | |
machine translation | ||
tuning | ||
baseline models | ||
shared task | ||
Type: | corpus | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:lindat.mff.cuni.cz:11372/LRT-1672 | |
DateStamp: | 2021-06-29 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Kamran, Amir; Jawaid, Bushra; Bojar, Ondřej; Stanojevic, Milos. 2016. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL). | |
Terms: | area_Europe country_CZ country_GB dcmi_Text iso639_ces iso639_eng olac_primary_text |