OLAC Record oai:lindat.mff.cuni.cz:11234/1-2582 |
Metadata | ||
Title: | English-Urdu Religious Parallel Corpus | |
Bibliographic Citation: | http://hdl.handle.net/11234/1-2582 | |
Creator: | Jawaid, Bushra | |
Zeman, Daniel | ||
Date (W3CDTF): | 2018-01-05T15:38:19Z | |
Date Available: | 2018-01-05T15:38:19Z | |
Description: | English-Urdu parallel corpus is a collection of religious texts (Quran, Bible) in English and Urdu language with sentence alignments. The corpus can be used for experiments with statistical machine translation. Our modifications of crawled data include but are not limited to the following: 1- Manually corrected sentence alignment of the corpora. 2- Our data split (training-development-test) so that our published experiments can be reproduced. 3- Tokenization (optional, but needed to reproduce our experiments). 4- Normalization (optional) of e.g. European vs. Urdu numerals, European vs. Urdu punctuation, removal of Urdu diacritics. | |
Identifier (URI): | http://hdl.handle.net/11234/1-2582 | |
Language: | English | |
Urdu | ||
Language (ISO639): | eng | |
urd | ||
Publisher: | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) | |
Rights: | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) | |
http://creativecommons.org/licenses/by-nc-sa/4.0/ | ||
Subject: | parallel corpus | |
religious text | ||
machine translation | ||
Type: | corpus | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:lindat.mff.cuni.cz:11234/1-2582 | |
DateStamp: | 2021-06-29 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Jawaid, Bushra; Zeman, Daniel. 2018. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL). | |
Terms: | area_Asia area_Europe country_GB country_PK dcmi_Text iso639_eng iso639_urd olac_primary_text |