OLAC Record: GlobalPhone Tamil

OLAC Record
oai:catalogue.elra.info:ELRA-S0205

Metadata

Title: GlobalPhone Tamil

Access Rights: Rights available for: nonCommercialUse, commercialUse

Date Available (W3CDTF): 2006-01-30

Date Issued (W3CDTF): 2006-01-30

Date Modified (W3CDTF): 2017-06-26

Description: The GlobalPhone corpus developed in collaboration with the Karlsruhe Institute of Technology (KIT) was designed to provide read speech data for the development and evaluation of large continuous speech recognition systems in the most widespread languages of the world, and to provide a uniform, multilingual speech and text database for language independent and language adaptive speech recognition as well as for language identification tasks. The entire GlobalPhone corpus enables the acquisition of acoustic-phonetic knowledge of the following 22 spoken languages: Arabic (ELRA-S0192), Bulgarian (ELRA-S0319), Chinese-Mandarin (ELRA-S0193), Chinese-Shanghai (ELRA-S0194), Croatian (ELRA-S0195), Czech (ELRA-S0196), French (ELRA-S0197), German (ELRA-S0198), Hausa (ELRA-S0347), Japanese (ELRA-S0199), Korean (ELRA-S0200), Polish (ELRA-S0320), Portuguese (Brazilian) (ELRA-S0201), Russian (ELRA-S0202), Spanish (Latin America) (ELRA-S0203), Swahili (ELRA-S0375), Swedish (ELRA-S0204), Tamil (ELRA-S0205), Thai (ELRA-S0321), Turkish (ELRA-S0206), Ukrainian (ELRA-S0377), and Vietnamese (ELRA-S0322).In each language about 100 sentences were read from each of the 100 speakers. The read texts were selected from national newspapers available via Internet to provide a large vocabulary. The read articles cover national and international political news as well as economic news. The speech is available in 16bit, 16kHz mono quality, recorded with a close-speaking microphone (Sennheiser 440-6). The transcriptions are internally validated and supplemented by special markers for spontaneous effects like stuttering, false starts, and non-verbal effects like laughing and hesitations. Speaker information like age, gender, occupation, etc. as well as information about the recording setup complement the database. The entire GlobalPhone corpus contains over 450 hours of speech spoken by more than 2100 native adult speakers.Data is shortened by means of the shorten program written by Tony Robinson. Alternatively, the data could be delivered unshorten.The Tamil corpus was produced using the Thinaboomi Tamil Daily newspaper. It contains recordings of 47 speakers (gender unspecified) recorded in India. No age distribution is available.

Identifier: ELRA-S0205

ISLRN: 269-930-371-035-1

Identifier (URI): https://catalog.elra.info/en-us/repository/browse/ELRA-S0205/

Language: Tamil

Language (ISO639): tam

Medium: Not specified

Publisher: ELRA (European Language Resources Association)

Type (DCMI): Sound

Type (OLAC): primary_text

OLAC Info

Archive: ELRA Catalogue of Language Resources

Description: http://www.language-archives.org/archive/catalogue.elra.info

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:catalogue.elra.info:ELRA-S0205

DateStamp: 2006-01-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: n.a. 2006. ELRA (European Language Resources Association).
Terms: area_Asia country_IN dcmi_Sound iso639_tam olac_primary_text

http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-S0205
Up-to-date as of: Wed Jul 15 7:03:32 EDT 2026

Metadata
Title:		GlobalPhone Tamil
Access Rights:		Rights available for: nonCommercialUse, commercialUse
Date Available (W3CDTF):		2006-01-30
Date Issued (W3CDTF):		2006-01-30
Date Modified (W3CDTF):		2017-06-26
Description:		The GlobalPhone corpus developed in collaboration with the Karlsruhe Institute of Technology (KIT) was designed to provide read speech data for the development and evaluation of large continuous speech recognition systems in the most widespread languages of the world, and to provide a uniform, multilingual speech and text database for language independent and language adaptive speech recognition as well as for language identification tasks. The entire GlobalPhone corpus enables the acquisition of acoustic-phonetic knowledge of the following 22 spoken languages: Arabic (ELRA-S0192), Bulgarian (ELRA-S0319), Chinese-Mandarin (ELRA-S0193), Chinese-Shanghai (ELRA-S0194), Croatian (ELRA-S0195), Czech (ELRA-S0196), French (ELRA-S0197), German (ELRA-S0198), Hausa (ELRA-S0347), Japanese (ELRA-S0199), Korean (ELRA-S0200), Polish (ELRA-S0320), Portuguese (Brazilian) (ELRA-S0201), Russian (ELRA-S0202), Spanish (Latin America) (ELRA-S0203), Swahili (ELRA-S0375), Swedish (ELRA-S0204), Tamil (ELRA-S0205), Thai (ELRA-S0321), Turkish (ELRA-S0206), Ukrainian (ELRA-S0377), and Vietnamese (ELRA-S0322).In each language about 100 sentences were read from each of the 100 speakers. The read texts were selected from national newspapers available via Internet to provide a large vocabulary. The read articles cover national and international political news as well as economic news. The speech is available in 16bit, 16kHz mono quality, recorded with a close-speaking microphone (Sennheiser 440-6). The transcriptions are internally validated and supplemented by special markers for spontaneous effects like stuttering, false starts, and non-verbal effects like laughing and hesitations. Speaker information like age, gender, occupation, etc. as well as information about the recording setup complement the database. The entire GlobalPhone corpus contains over 450 hours of speech spoken by more than 2100 native adult speakers.Data is shortened by means of the shorten program written by Tony Robinson. Alternatively, the data could be delivered unshorten.The Tamil corpus was produced using the Thinaboomi Tamil Daily newspaper. It contains recordings of 47 speakers (gender unspecified) recorded in India. No age distribution is available.
Identifier:		ELRA-S0205
Identifier:		ISLRN: 269-930-371-035-1
Identifier (URI):		https://catalog.elra.info/en-us/repository/browse/ELRA-S0205/
Language:		Tamil
Language (ISO639):		tam
Medium:		Not specified
Publisher:		ELRA (European Language Resources Association)
Type (DCMI):		Sound
Type (OLAC):		primary_text
OLAC Info
Archive:		ELRA Catalogue of Language Resources
Description:		http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:catalogue.elra.info:ELRA-S0205
DateStamp:		2006-01-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		n.a. 2006. ELRA (European Language Resources Association).
Terms:		area_Asia country_IN dcmi_Sound iso639_tam olac_primary_text