OLAC Record: CSLU: Spoltech Brazilian Portuguese Version 1.0

OLAC Record
oai:www.ldc.upenn.edu:LDC2006S16

Metadata

Title: CSLU: Spoltech Brazilian Portuguese Version 1.0

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Schramm, Mauricio C., et al. CSLU: Spoltech Brazilian Portuguese Version 1.0 LDC2006S16. Web Download. Philadelphia: Linguistic Data Consortium, 2006

Contributor: Schramm, Mauricio C.

Freitas, Luis Felipe R.

Zanuz, Adriano

Barone, Dante

Date (W3CDTF): 2006

Date Issued (W3CDTF): 2006-04-17

Description: *Introduction* CSLU: Spoltech Brazilian Portuguese Version 1.0 was developed by the Center for Spoken Language Understanding (CSLU) and contains 5 hours of Portuguese microphone speech with phonetic and orthographic transcriptions. The utterances consist of both read speech (for phonetic coverage) and responses to questions (for spontaneous speech). The corpus contains 480 speakers from a variety of regions in Brazil and 8,207 separate utterances. A total of 2,540 utterances have been transcribed at the word level (without time alignments), and 5,479 utterances have been transcribed at the phoneme level (with time alignments). Protocol design, recording, and transcription were performed by the Universidade Federal do Rio Grande do Sul and the Universidade de Caxias do Sul. *Data* The data has been recorded at 44.1 kHz (mono, 16-bit) and stored in RIFF format. The recording was conducted with a direct connection from the microphone to the sound card. The sound card was SoundBlaster-compatible. For the prompted sentences, the sentence was hidden from view when recording began, so that the speaker might utter the sentence more naturally. Verification of the recording quality was performed immediately after each utterance recording; the data-collection software allowed the speaker to re-record utterances in case the recording was not of sufficient quality. The acoustic environment was not controlled, in order to allow for background conditions that would occur in application environments. *Samples* For an example of the data in this corpus, please listen to this audio sample (WAV) and examine its transcript (TXT). *Updates* None at this time.

Extent: Corpus size: 1677721 KB

Format: Sampling Rate: 44100

Sampling Format: 1-channel pcm

Identifier: LDC2006S16

https://catalog.ldc.upenn.edu/LDC2006S16

ISBN: 1-58563-383-6

ISLRN: 386-396-917-783-5

DOI: 10.35111/kkda-b418

Language: Portuguese

Language (ISO639): por

License: CSLU Agreement: https://catalog.ldc.upenn.edu/license/cslu-corpora-non-commercial-research-only.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2006S16

Rights Holder: Portions © 1994-2002 Center for Spoken Language Understanding, Oregon Health & Science University, © 2006 Trustees of the University of Pennsylvania

Type (DCMI): Sound

Text

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2006S16

DateStamp: 2026-03-13

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Schramm, Mauricio C.; Freitas, Luis Felipe R.; Zanuz, Adriano; Barone, Dante. 2006. Linguistic Data Consortium.
Terms: area_Europe country_PT dcmi_Sound dcmi_Text iso639_por olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2006S16
Up-to-date as of: Wed Jul 8 7:30:26 EDT 2026

Metadata
Title:		CSLU: Spoltech Brazilian Portuguese Version 1.0
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Schramm, Mauricio C., et al. CSLU: Spoltech Brazilian Portuguese Version 1.0 LDC2006S16. Web Download. Philadelphia: Linguistic Data Consortium, 2006
Contributor:		Schramm, Mauricio C.
		Freitas, Luis Felipe R.
		Zanuz, Adriano
		Barone, Dante
Date (W3CDTF):		2006
Date Issued (W3CDTF):		2006-04-17
Description:		Introduction CSLU: Spoltech Brazilian Portuguese Version 1.0 was developed by the Center for Spoken Language Understanding (CSLU) and contains 5 hours of Portuguese microphone speech with phonetic and orthographic transcriptions. The utterances consist of both read speech (for phonetic coverage) and responses to questions (for spontaneous speech). The corpus contains 480 speakers from a variety of regions in Brazil and 8,207 separate utterances. A total of 2,540 utterances have been transcribed at the word level (without time alignments), and 5,479 utterances have been transcribed at the phoneme level (with time alignments). Protocol design, recording, and transcription were performed by the Universidade Federal do Rio Grande do Sul and the Universidade de Caxias do Sul. Data The data has been recorded at 44.1 kHz (mono, 16-bit) and stored in RIFF format. The recording was conducted with a direct connection from the microphone to the sound card. The sound card was SoundBlaster-compatible. For the prompted sentences, the sentence was hidden from view when recording began, so that the speaker might utter the sentence more naturally. Verification of the recording quality was performed immediately after each utterance recording; the data-collection software allowed the speaker to re-record utterances in case the recording was not of sufficient quality. The acoustic environment was not controlled, in order to allow for background conditions that would occur in application environments. Samples For an example of the data in this corpus, please listen to this audio sample (WAV) and examine its transcript (TXT). Updates None at this time.
Extent:		Corpus size: 1677721 KB
Format:		Sampling Rate: 44100
Format:		Sampling Format: 1-channel pcm
Identifier:		LDC2006S16
		https://catalog.ldc.upenn.edu/LDC2006S16
		ISBN: 1-58563-383-6
		ISLRN: 386-396-917-783-5
		DOI: 10.35111/kkda-b418
Language:		Portuguese
Language (ISO639):		por
License:		CSLU Agreement: https://catalog.ldc.upenn.edu/license/cslu-corpora-non-commercial-research-only.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2006S16
Rights Holder:		Portions © 1994-2002 Center for Spoken Language Understanding, Oregon Health & Science University, © 2006 Trustees of the University of Pennsylvania
Type (DCMI):		Sound
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2006S16
DateStamp:		2026-03-13
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Schramm, Mauricio C.; Freitas, Luis Felipe R.; Zanuz, Adriano; Barone, Dante. 2006. Linguistic Data Consortium.
Terms:		area_Europe country_PT dcmi_Sound dcmi_Text iso639_por olac_primary_text