OLAC Record: Korean Telephone Conversations Transcripts

OLAC Record
oai:www.ldc.upenn.edu:LDC2003T08

Metadata

Title: Korean Telephone Conversations Transcripts

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Ko, Eon-Suk, et al. Korean Telephone Conversations Transcripts LDC2003T08. Web Download. Philadelphia: Linguistic Data Consortium, 2003

Contributor: Ko, Eon-Suk

Han, Na-Rae

Strassel, Stephanie

Martey, Nii

Date (W3CDTF): 2003

Date Issued (W3CDTF): 2003-05-16

Description: *Introduction* Korean Telephone Conversations Transcripts was produced by the Linguistic Data Consortium (LDC) and contains transcripts of 100 telephone calls in Korean, totaling approximately 190 K-words (thousands of words). The telephone conversations on which these transcripts are based were originally recorded as part of the CALLFRIEND project. The CALLFRIEND Korean telephone speech was collected by Linguistic Data Consortium primarily in support of the Language Identification (LID) project, sponsored by the U.S. Department of Defense. The calls were later transcribed for use in other projects. This publication consists of 100 transcribed telephone conversations in Korean. The corresponding speech files for these transcripts are available in Korean Telephone Conversations Speech (LDC2003S03). The Korean orthographic forms from the 100 transcription files serve as the head-words in the associated Korean Telephone Conversations Lexicon. The recorded conversations are between native speakers of Korean and last up to 30 minutes, of which the transcribed speech covers between 15 to 18 minutes. All speakers were aware that they were being recorded. They were given no guidelines concerning what they should talk about. Once a caller was recruited to participate, he/she was given a free choice of whom to call. Most participants called family members or close friends. All calls originated in either the United States or Canada. *Data* There are 100 time aligned text files, totaling approximately 190 K-words and 25K unique words. The transcription followed the orthographic form of spoken words instead of the actual pronunciation in the cases of mismatching. When the mismatch between the written form and the actual pronunciation is beyond what can be predicted by the pronunciation dictionary, it was marked with a '+' symbol. All files are in Korean orthography: orthographic Korean characters are in Hangul, encoded in KSC5601 (Wansung) system, also known as EUC-KR or ISO-2022-KR. *Samples* Please follow this link for a sample transcript: txt | gif. *Updates* There are no updates available at this time.

Extent: Corpus size: 3993 KB

Identifier: LDC2003T08

https://catalog.ldc.upenn.edu/LDC2003T08

ISBN: 1-58563-264-3

ISLRN: 248-953-409-804-2

DOI: 10.35111/92vj-wg93

Language: Korean

Language (ISO639): kor

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2003T08

Rights Holder: Portions © 2003 Trustees of the University of Pennsylvania.

Type (DCMI): Text

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2003T08

DateStamp: 2024-09-13

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Ko, Eon-Suk; Han, Na-Rae; Strassel, Stephanie; Martey, Nii. 2003. Linguistic Data Consortium.
Terms: area_Asia country_KR dcmi_Text iso639_kor olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2003T08
Up-to-date as of: Wed Oct 29 7:00:16 EDT 2025

Metadata
Title:		Korean Telephone Conversations Transcripts
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Ko, Eon-Suk, et al. Korean Telephone Conversations Transcripts LDC2003T08. Web Download. Philadelphia: Linguistic Data Consortium, 2003
Contributor:		Ko, Eon-Suk
		Han, Na-Rae
		Strassel, Stephanie
		Martey, Nii
Date (W3CDTF):		2003
Date Issued (W3CDTF):		2003-05-16
Description:		Introduction Korean Telephone Conversations Transcripts was produced by the Linguistic Data Consortium (LDC) and contains transcripts of 100 telephone calls in Korean, totaling approximately 190 K-words (thousands of words). The telephone conversations on which these transcripts are based were originally recorded as part of the CALLFRIEND project. The CALLFRIEND Korean telephone speech was collected by Linguistic Data Consortium primarily in support of the Language Identification (LID) project, sponsored by the U.S. Department of Defense. The calls were later transcribed for use in other projects. This publication consists of 100 transcribed telephone conversations in Korean. The corresponding speech files for these transcripts are available in Korean Telephone Conversations Speech (LDC2003S03). The Korean orthographic forms from the 100 transcription files serve as the head-words in the associated Korean Telephone Conversations Lexicon. The recorded conversations are between native speakers of Korean and last up to 30 minutes, of which the transcribed speech covers between 15 to 18 minutes. All speakers were aware that they were being recorded. They were given no guidelines concerning what they should talk about. Once a caller was recruited to participate, he/she was given a free choice of whom to call. Most participants called family members or close friends. All calls originated in either the United States or Canada. Data There are 100 time aligned text files, totaling approximately 190 K-words and 25K unique words. The transcription followed the orthographic form of spoken words instead of the actual pronunciation in the cases of mismatching. When the mismatch between the written form and the actual pronunciation is beyond what can be predicted by the pronunciation dictionary, it was marked with a '+' symbol. All files are in Korean orthography: orthographic Korean characters are in Hangul, encoded in KSC5601 (Wansung) system, also known as EUC-KR or ISO-2022-KR. Samples Please follow this link for a sample transcript: txt \| gif. Updates There are no updates available at this time.
Extent:		Corpus size: 3993 KB
Identifier:		LDC2003T08
		https://catalog.ldc.upenn.edu/LDC2003T08
		ISBN: 1-58563-264-3
		ISLRN: 248-953-409-804-2
		DOI: 10.35111/92vj-wg93
Language:		Korean
Language (ISO639):		kor
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2003T08
Rights Holder:		Portions © 2003 Trustees of the University of Pennsylvania.
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2003T08
DateStamp:		2024-09-13
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Ko, Eon-Suk; Han, Na-Rae; Strassel, Stephanie; Martey, Nii. 2003. Linguistic Data Consortium.
Terms:		area_Asia country_KR dcmi_Text iso639_kor olac_primary_text