OLAC Record oai:catalogue.elra.info:ELRA-W0316 |
Metadata | ||
Title: | Ema-lon Manipuri Corpus (including word embedding and language model) | |
Access Rights: | Rights available for: attribution | |
Date Available (W3CDTF): | 2021-10-08 | |
Date Issued (W3CDTF): | 2021-10-08 | |
Description: | The Ema-lon Manipuri Corpus consists of a set of resources for Manipuri language (locally known as Meiteilon) for the purpose of machine translation. The main source for these resources is the Sangai Express news website. The resources that constitute the present corpus are listed below:1. EM Corpus, abbreviation of Ema-lon Manipuri Corpus, which translates to ‘our mother tongue Manipuri corpus’. This is the first comparable corpus built for the Manipuri (mni)-English (eng) language pair from sentences crawled and collected from The Sangai Express (https://www.thesangaiexpress.com) from August 2020 to March 2021. It contains :- Monolingual data: 1,034,715 Manipuri sentences and 846,796 English sentences in version 1 and 1,880,035 Manipuri sentences and 1,450,053 English sentences in version 2. This makes a comparable corpus in the two languages.- Parallel data: 124,975 Manipuri-English aligned sentences extracted from the comparable data version 2. 2. EM-FT is also the first FastText word embedding available for Manipuri language trained on 1,880,035 Manipuri sentences.3. EM-ALBERT is the first ALBERT model available for Manipuri language which is trained on 1,034,715 Manipuri sentences (from the first version of the EM Corpus). | |
Identifier: | ELRA-W0316 | |
ISLRN: 588-170-827-016-7 | ||
Identifier (URI): | https://catalog.elra.info/en-us/repository/browse/ELRA-W0316/ | |
Language: | English | |
Manipuri | ||
Language (ISO639): | eng | |
mni | ||
Medium: | Not specified | |
Publisher: | ELRA (European Language Resources Association) | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | ELRA Catalogue of Language Resources | |
Description: | http://www.language-archives.org/archive/catalogue.elra.info | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:catalogue.elra.info:ELRA-W0316 | |
DateStamp: | 2021-10-08 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | n.a. 2021. ELRA (European Language Resources Association). | |
Terms: | area_Asia area_Europe country_GB country_IN dcmi_Text iso639_eng iso639_mni olac_primary_text |