OLAC Record oai:catalogue.elra.info:ELRA-S0404 |
Metadata | ||
Title: | MGB-5 Moroccan Dialect | |
Access Rights: | Rights available for: nonCommercialUse, commercialUse | |
Coverage: | Morocco | |
Date Available (W3CDTF): | 2023-04-04 | |
Date Issued (W3CDTF): | 2023-04-04 | |
Description: | The MGB-5 Moroccan Dialect comprises 14 hours of Moroccan Arabic speech extracted from 93 YouTube videos distributed across seven genres: comedy, cooking, family/children, fashion, drama, sports, and science clips.Given that dialectal Arabic does not have a clearly defined orthography, different people tend to write the same word in slightly different forms. Therefore, instead of developing strict guidelines to ensure a standardized orthography, variations in spelling are allowed. Thus multiple transcriptions were produced, allowing transcribers to write the transcripts as they deemed correct. Every file has been segmented and transcribed by four different Moroccan annotators.The 93 YouTube clips have been manually labelled for speech, non-speech segments. About 12 minutes from each program were selected for transcription. The resulting speech segments were then distributed into train, development and test data sets as follows:Training data: 10.2 hours from 69 programsDevelopment data: 1.8 hours from 10 programsTesting data: 2.0 hours from 14 programsIn addition to the transcribed 14 hours, the full programs are also provided, which amounts 48 hours for the 93 programs. This data can be used for in-domain speech or genre adaptation. | |
Identifier: | ELRA-S0404 | |
ISLRN: 938-639-614-524-5 | ||
Identifier (URI): | https://catalog.elra.info/en-us/repository/browse/ELRA-S0404/ | |
Language: | Arabic | |
Language (ISO639): | ara | |
Medium: | Not specified | |
Publisher: | ELRA (European Language Resources Association) | |
Type (DCMI): | Sound | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | ELRA Catalogue of Language Resources | |
Description: | http://www.language-archives.org/archive/catalogue.elra.info | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:catalogue.elra.info:ELRA-S0404 | |
DateStamp: | 2023-04-04 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | n.a. 2023. ELRA (European Language Resources Association). | |
Terms: | dcmi_Sound iso639_ara olac_primary_text |