OLAC Record oai:www.ldc.upenn.edu:LDC2006S35 |
Metadata | ||
Title: | CSLU: Multilanguage Telephone Speech Version 1.2 | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Muthusamy, Yeshwant, Ronald Cole, and Beatrice Oshika. CSLU: Multilanguage Telephone Speech Version 1.2 LDC2006S35. Web Download. Philadelphia: Linguistic Data Consortium, 2006 | |
Contributor: | Muthusamy, Yeshwant | |
Cole, Ronald Allan | ||
Oshika, Beatrice | ||
Date (W3CDTF): | 2006 | |
Date Issued (W3CDTF): | 2006-06-15 | |
Description: | *Introduction* CSLU: Multilanguage Telephone Speech Version 1.2 was developed by The Center for Spoken Language Understanding (CSLU) and consists of telephone approximately 38.5 hours of speech, about eight hours of which has time-aligned phonetic transcripts, from 11 languages: English, Farsi, French, German, Hindi, Japanese, Korean, Mandarin, Spanish, Tamil, and Vietnamese. The corpus contains fixed vocabulary utterances (e.g. days of the week) as well as fluent continuous speech. The current release includes recorded utterances from about 2,052 speakers, 12,152 speech files, and 619 phonetic transcripts. This corpus was collected and developed in 1992. *Data* Each subject called the CSLU data collection system by dialing a toll-free number. Most subjects were respondents to postings on USEnet newsgroups. Subjects were asked to contribute their voice to science to help with the research. Participating subjects responded to prompts that were designed to elicit vocabulary of three types: fixed and useful -- language spoken, days of the week, numbers domain-specific -- short open-ended questions unrestricted -- monologue on subject of choice An analog telephone line was connected to a Gradient Technologies box. Data from incoming calls were recorded by the Gradient box. The sampling rate was 8 kHz and the files were stored in 16-bit linear format on a UNIX file system. Each utterance was recorded as a separate file. *Samples* For an example of the data in this corpus, please listen to these audio samples in Korean (WAV), Tamil (WAV) and English (WAV). *Updates* None at this time. | |
Extent: | Corpus size: 2202009 KB | |
Format: | Sampling Rate: 8000 | |
Sampling Format: pcm | ||
Identifier: | LDC2006S35 | |
https://catalog.ldc.upenn.edu/LDC2006S35 | ||
ISBN: 1-58563-390-9 | ||
ISLRN: 871-936-811-171-7 | ||
DOI: 10.35111/j0p6-f049 | ||
Language: | Vietnamese | |
Tamil | ||
Spanish | ||
Iranian Persian | ||
Korean | ||
Japanese | ||
Hindi | ||
French | ||
English | ||
German | ||
Mandarin Chinese | ||
Language (ISO639): | vie | |
tam | ||
spa | ||
pes | ||
kor | ||
jpn | ||
hin | ||
fra | ||
eng | ||
deu | ||
cmn | ||
License: | CSLU Agreement: https://catalog.ldc.upenn.edu/license/cslu-corpora-non-commercial-research-only.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2006S35 | |
Rights Holder: | Portions © 1992, 2000, 2002 Center for Spoken Language Understanding, Oregon Health & Science University, © 2006 Trustees of the University of Pennsylvania | |
Type (DCMI): | Sound | |
Text | ||
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2006S35 | |
DateStamp: | 2024-03-19 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Muthusamy, Yeshwant; Cole, Ronald Allan; Oshika, Beatrice. 2006. Linguistic Data Consortium. | |
Terms: | area_Asia area_Europe country_CN country_DE country_ES country_FR country_GB country_IN country_IR country_JP country_KR country_VN dcmi_Sound dcmi_Text iso639_cmn iso639_deu iso639_eng iso639_fra iso639_hin iso639_jpn iso639_kor iso639_pes iso639_spa iso639_tam iso639_vie olac_primary_text |