OLAC Record: Languages in Migration

OLAC Record
oai:lindat.mff.cuni.cz:11372/LRT-4777

Metadata

Title: Languages in Migration

Bibliographic Citation: http://hdl.handle.net/11372/LRT-4777

Creator: Bučková, Aneta

Nekula, Marek

Lukeš, David

Woźniak, Michał

Wastl, Michael

Polowy, Louisa

Date (W3CDTF): 2023-02-24T17:10:50Z

Date Available: 2023-02-24T17:10:50Z

Description: LANGUAGES IN MIGRATION is designed as a representation of authentic spoken Czech and German that is used in informal speech (private environment, spontaneity, unpreparedness etc.) by Czech-German bilingual speakers born in Czechoslovakia around 1955 and who departed for Germany after becoming 12 years old. The corpus is composed of interviews conducted from 2018–2020 with 20 speakers on language biographies and narrated in Czech and German respectively. 10 interviews were recorded with late (German) repatriates and 10 with Czech migrants. The corpus includes transcripts of ca. 14 hours of Czech recordings and ca. 13,5 hours of German recordings. It contains 217 650 orthographic words (i.e. a total of 286 533 tokens including punctuation). Metadata of LANGUAGES IN MIGRATION include basic sociolinguistically relevant speaker categories (gender, year of birth and of migration, level of education and region of childhood and present residence). The transcription of LANGUAGES IN MIGRATION is linked to the corresponding audio track. The transcription was carried out on the orthographic tier and supplemented by an additional metalanguage tier. The corpus LANGUAGES IN MIGRATION is lemmatized and morphologically tagged in different formats for Czech and German (Stuttgart-Tübingen-Tagset). Deviations from the norm of the spoken Czech and German of the homeland, which are understood as the result of language contact and language isolation, are tagged in a further tier both in the Czech and in the German sub-corpuses of LANGUAGES IN MIGRATION. The (anonymized) corpus is provided in form of transcripts in EAF format, which can be viewed via the freely available ELAN program, and a (semi-XML) vertical format used as an input to the Manatee query engine. The data thus correspond to the corpus available via the KonText query engine to registered users of the CNC at http://www.korpus.cz

Identifier (URI): http://hdl.handle.net/11372/LRT-4777

Language: German

Czech

Language (ISO639): deu

ces

Publisher: Faculty of Arts, Institute of the Czech National Corpus, Charles University in Prague

Universität Regensburg

Rights: Czech National Corpus (Shuffled Corpus Data)

https://lindat.mff.cuni.cz/repository/xmlui/page/license-cnc

Subject: spoken language

bilingual

syntactic annotation

migrant language

narrative interviews

language biography

Type: corpus

Type (DCMI): Text

Type (OLAC): primary_text

OLAC Info

Archive: LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

Description: http://www.language-archives.org/archive/lindat.mff.cuni.cz

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:lindat.mff.cuni.cz:11372/LRT-4777

DateStamp: 2023-02-24

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Bučková, Aneta; Nekula, Marek; Lukeš, David; Woźniak, Michał; Wastl, Michael; Polowy, Louisa. 2023. Faculty of Arts, Institute of the Czech National Corpus, Charles University in Prague.
Terms: area_Europe country_CZ country_DE dcmi_Text iso639_ces iso639_deu olac_primary_text

http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11372/LRT-4777
Up-to-date as of: Mon Jun 16 1:07:52 EDT 2025

Metadata
Title:		Languages in Migration
Bibliographic Citation:		http://hdl.handle.net/11372/LRT-4777
Creator:		Bučková, Aneta
		Nekula, Marek
		Lukeš, David
		Woźniak, Michał
		Wastl, Michael
		Polowy, Louisa
Date (W3CDTF):		2023-02-24T17:10:50Z
Date Available:		2023-02-24T17:10:50Z
Description:		LANGUAGES IN MIGRATION is designed as a representation of authentic spoken Czech and German that is used in informal speech (private environment, spontaneity, unpreparedness etc.) by Czech-German bilingual speakers born in Czechoslovakia around 1955 and who departed for Germany after becoming 12 years old. The corpus is composed of interviews conducted from 2018–2020 with 20 speakers on language biographies and narrated in Czech and German respectively. 10 interviews were recorded with late (German) repatriates and 10 with Czech migrants. The corpus includes transcripts of ca. 14 hours of Czech recordings and ca. 13,5 hours of German recordings. It contains 217 650 orthographic words (i.e. a total of 286 533 tokens including punctuation). Metadata of LANGUAGES IN MIGRATION include basic sociolinguistically relevant speaker categories (gender, year of birth and of migration, level of education and region of childhood and present residence). The transcription of LANGUAGES IN MIGRATION is linked to the corresponding audio track. The transcription was carried out on the orthographic tier and supplemented by an additional metalanguage tier. The corpus LANGUAGES IN MIGRATION is lemmatized and morphologically tagged in different formats for Czech and German (Stuttgart-Tübingen-Tagset). Deviations from the norm of the spoken Czech and German of the homeland, which are understood as the result of language contact and language isolation, are tagged in a further tier both in the Czech and in the German sub-corpuses of LANGUAGES IN MIGRATION. The (anonymized) corpus is provided in form of transcripts in EAF format, which can be viewed via the freely available ELAN program, and a (semi-XML) vertical format used as an input to the Manatee query engine. The data thus correspond to the corpus available via the KonText query engine to registered users of the CNC at http://www.korpus.cz
Identifier (URI):		http://hdl.handle.net/11372/LRT-4777
Language:		German
Language:		Czech
Language (ISO639):		deu
Language (ISO639):		ces
Publisher:		Faculty of Arts, Institute of the Czech National Corpus, Charles University in Prague
Publisher:		Universität Regensburg
Rights:		Czech National Corpus (Shuffled Corpus Data)
Rights:		https://lindat.mff.cuni.cz/repository/xmlui/page/license-cnc
Subject:		spoken language
		bilingual
		syntactic annotation
		migrant language
		narrative interviews
		language biography
Type:		corpus
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:		http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:lindat.mff.cuni.cz:11372/LRT-4777
DateStamp:		2023-02-24
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Bučková, Aneta; Nekula, Marek; Lukeš, David; Woźniak, Michał; Wastl, Michael; Polowy, Louisa. 2023. Faculty of Arts, Institute of the Czech National Corpus, Charles University in Prague.
Terms:		area_Europe country_CZ country_DE dcmi_Text iso639_ces iso639_deu olac_primary_text