|  | OLAC Record oai:www.ldc.upenn.edu:LDC2017S24 | 
| Metadata | ||
| Title: | CHiME3 | |
| Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
| Bibliographic Citation: | Barker, Jon, et al. CHiME3 LDC2017S24. Web Download. Philadelphia: Linguistic Data Consortium, 2017 | |
| Contributor: | Barker, Jon | |
| Marxer, Ricard | ||
| Vincent, Emmanuel | ||
| Watanabe, Shinji | ||
| Date (W3CDTF): | 2017 | |
| Date Issued (W3CDTF): | 2017-12-15 | |
| Description: | *Introduction* CHiME3 was developed as part of The 3rd CHiME Speech Separation and Recognition Challenge and contains approximately 342 hours of English speech and transcripts from noisy environments and 50 hours of noisy environment audio. The CHiME Challenges focus on distant-microphone automatic speech recognition (ASR) in real-world environments. See the CHIME3 home page for more information. The task in CHiME3 was similar to the medium vocabulary track of the CHiME2 Challenge in that the target utterances were taken from CSR-I (WSJ0) Complete (LDC93S6A), specifically, the 5,000 word subset of read speech from Wall Street Journal news text. CHiME3 involved two types of data: speech data recorded in very noisy environments (on a bus, in a cafe, pedestrian area, and street junction) and noisy utterances generated by artificially mixing clean speech data with noisy backgrounds. LDC has also released two CHiME2 corpora -- CHiME2 Grid (LDC2017S07) and CHiME2 WSJ0 (LDC2017S10). *Data* Data is divided into training, development and test sets. All data is provided as 16 bit WAV files sampled at 16 kHz. The audio data consists of the background noises, enhanced speech data using the baseline speech enhancement technique, unsegmented noisy speech data, and segmented noisy speech data. Annotation files are based on JSON (JavaScript Object Notation) format. Transcripts are plain text in either DOT or TRN format. Also included are three software tools for acoustic simulation, speech enhancement, and ASR. *Samples* Please view the following samples: * Isolated * Enhanced * Embedded * Background * Transcript *Updates* None at this time. | |
| Extent: | Corpus size: 45835296 KB | |
| Format: | Sampling Rate: 16000 | |
| Sampling Format: pcm | ||
| Identifier: | LDC2017S24 | |
| https://catalog.ldc.upenn.edu/LDC2017S24 | ||
| ISBN: 1-58563-826-9 | ||
| ISLRN: 857-070-463-285-8 | ||
| DOI: 10.35111/v154-hj21 | ||
| Language: | English | |
| Language (ISO639): | eng | |
| License: | LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | |
| Medium: | Distribution: Web Download | |
| Publisher: | Linguistic Data Consortium | |
| Publisher (URI): | https://www.ldc.upenn.edu | |
| Rights Holder: | Portions © 1987-1989 Dow Jones & Company, Inc., © 2017 Inria Nancy - Grand Est, University of Sheffield, Mitsubishi Electric Research Labs, Fondazione Bruno Kessler, © 1992, 1993, 1996, 2017 Trustees of the University of Pennsylvania | |
| Type (DCMI): | Sound | |
| Text | ||
| Type (OLAC): | primary_text | |
| OLAC Info | ||
| Archive: | The LDC Corpus Catalog | |
| Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
| GetRecord: | OAI-PMH request for OLAC format | |
| GetRecord: | Pre-generated XML file | |
| OAI Info | ||
| OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2017S24 | |
| DateStamp: | 2021-03-04 | |
| GetRecord: | OAI-PMH request for simple DC format | |
| Search Info | ||
| Citation: | Barker, Jon; Marxer, Ricard; Vincent, Emmanuel; Watanabe, Shinji. 2017. Linguistic Data Consortium. | |
| Terms: | area_Europe country_GB dcmi_Sound dcmi_Text iso639_eng olac_primary_text | |