OLAC Record
oai:lindat.mff.cuni.cz:11234/1-5518

Metadata
Title:UFAL Speech Corpus of North Levantine Arabic 1.0 - Part 1
Bibliographic Citation:http://hdl.handle.net/11234/1-5518
Creator:Zemánek, Petr
Pospíšil, Adam
Sellat, Hashem
Krubiński, Mateusz
Pecina, Pavel
Date (W3CDTF):2024-06-10T08:56:37Z
Date Available:2024-06-10T08:56:37Z
Description:The corpus contains recordings by the native speakers of the North Levantine Arabic (apc) acquired during 2020, 2021, and 2023 in Prague, Paris, Kabardia, and St. Petersburg. Altogether, there were 13 speakers (9 male and 4 female, aged 1x 15-20, 7x 20-30, 4x 30-40, and 1x 40-50). The recordings contain both monologues and dialogues on the topics of everyday life (health, education, family life, sports, culture) as well as information on both host countries (living abroad) and country of origin (Syria traditions, education system, etc.). Both types are spontaneous, the participants were given only the general subject and talked on the topic or discussed it freely. The transcription and translation team consisted of students of Arabic at Charles University, with an additional quality check provided by the native speakers of the dialect. The textual data is split between the (parallel) transcriptions (.apc) and translations (.eng), with one segment per line. The additional .yaml file provides mapping to the corresponding audio file (with the duration and offset in the "%S.%03d" format, i.e., seconds and milliseconds) and a unique speaker ID. The audio data is shared in the 48kHz .wav format, with dialogues and monologues in separate folders. All of the recordings are mono, with a single channel. For dialogues, there is a separate file for each speaker, e.g., "Tar_13052022_Czechia-01.wav" and "Tar_13052022_Czechia-02.wav". The data provided in this repository corresponds to the validation split of the dialectal Arabic to English shared task hosted at the 21st edition of the International Conference on Spoken Language Translation, i.e., IWSLT 2024.
Identifier (URI):http://hdl.handle.net/11234/1-5518
Language:North Levantine Arabic
English
Language (ISO639):apc
eng
Publisher:Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Rights:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
http://creativecommons.org/licenses/by-nc-sa/4.0/
Subject:speech corpus
speech recognition
speech-to-text translation
machine translation
multilingual
Arabic
Arabic Corpus
north levantine
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11234/1-5518
DateStamp:  2024-06-10
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Zemánek, Petr; Pospíšil, Adam; Sellat, Hashem; Krubiński, Mateusz; Pecina, Pavel. 2024. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL).
Terms: area_Asia area_Europe country_GB country_SY dcmi_Text iso639_apc iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11234/1-5518
Up-to-date as of: Wed Mar 5 0:42:37 EST 2025