![]() |
OLAC Record oai:www.ldc.upenn.edu:LDC2025S07 |
Metadata | ||
Title: | Mixer 6 - CHiME 8 Transcribed Calls and Interviews | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Wiesner, Matthew, et al. Mixer 6 - CHiME 8 Transcribed Calls and Interviews LDC2025S07. Web Download. Philadelphia: Linguistic Data Consortium, 2025 | |
Contributor: | Wiesner, Matthew | |
Raj, Desh | ||
Maciejewski, Matthew | ||
Haviland, Chloe | ||
Cornell, Samuele | ||
Chodroff, Eleanor | ||
Khudanpur, Sanjeev | ||
Godfrey, Jack | ||
Date (W3CDTF): | 2025 | |
Date Issued (W3CDTF): | 2025-08-15 | |
Description: | *Introduction* Mixer 6 - CHiME 8 Transcribed Calls and Interviews was developed for the 7th and 8th CHiME (Computational Hearing in Multisource Environments) challenges. It contains 80 hours of English interviews and telephone speech from Mixer 6 Speech (LDC2013S03) with transcripts developed for the CHiME challenges and divided into training, development and test sets. This data was used in CHiME 7 Task 1 and CHiME 8 Task 1 both of which focused on transcription and segmentation across varied recording conditions such as interviews, meetings, and dinner parties, with an emphasis on generalization across recording device types and array topologies. Mixer 6 Speech was developed by the Linguistic Data Consortium (LDC) and comprises 15,863 hours of audio recordings of interviews, transcript readings and conversational telephone speech involving 594 distinct native English speakers recorded over 14 channels. This material was collected by LDC in 2009 and 2010 as part of the Mixer project, specifically phase 6, the focus of which was on native American English speakers local to the Philadelphia area. *Data* The data includes audio from Mixer 6 Speech recorded on 13 microphones for a total of 1063 hours corresponding to 80 hours of speech. The development and test splits are speaker-disjoint from the training data and consist of fully transcribed, multi-microphone interviews. The transcripts were developed in three phases: (1) manual transcription, segmentation and automatic alignment with speech; (2) splitting sessions into sets; and (3) splitting certain sessions from the training set. Each segment was labeled with the speaker, the uttered text, and the start and end times in seconds for that segment. Audio data is provided as 16 bit FLAC files sampled at 16kHz. Transcripts are released as UTF-8 encoded JSON files. *Samples* Please view the following samples: * Speech Audio (FLAC) * Transcripts (JSON) *Updates* No updates at this time. | |
Extent: | Corpus size: 108000000 KB | |
Format: | Sampling Rate: 16000 | |
Sampling Format: 16-bit FLAC | ||
Identifier: | LDC2025S07 | |
https://catalog.ldc.upenn.edu/LDC2025S07 | ||
ISLRN: 017-424-674-662-6 | ||
DOI: 10.35111/pk0y-qp29 | ||
Language: | English | |
Language (ISO639): | eng | |
License: | LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2025S07 | |
Rights Holder: | Portions © 2009-2010, 2013, 2025 Trustees of the University of Pennsylvania | |
Subject: | English language | |
Subject (ISO639): | eng | |
Subject (OLAC): | text_and_corpus_linguistics | |
Type (DCMI): | Sound | |
Text | ||
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2025S07 | |
DateStamp: | 2025-08-15 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Wiesner, Matthew; Raj, Desh; Maciejewski, Matthew; Haviland, Chloe; Cornell, Samuele; Chodroff, Eleanor; Khudanpur, Sanjeev; Godfrey, Jack. 2025. Linguistic Data Consortium. | |
Terms: | area_Europe country_GB dcmi_Sound dcmi_Text iso639_eng olac_primary_text olac_text_and_corpus_linguistics | |
Inferred Metadata | ||
Country: | United Kingdom | |
Area: | Europe |