![]() |
OLAC Record oai:www.ldc.upenn.edu:LDC2025T03 |
Metadata | ||
Title: | The Xi’an Multi-Language Learner Corpus | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Zhang, Xiao, et al. The Xi’an Multi-Language Learner Corpus LDC2025T03. Web Download. Philadelphia: Linguistic Data Consortium, 2025 | |
Contributor: | Zhang, Xiao | |
Zhang, Ling | ||
Dang, Tian | ||
Feng, Yuanzhao | ||
Ji, Yujing | ||
Jiang, Xiaohui | ||
Kang, Zhewen | ||
Lu, Yan | ||
Nie, Wen | ||
Ren, Hanyu | ||
Wang, Canjun | ||
Wang, Jiayi | ||
Wang, Yu | ||
Wu, Chen | ||
Wu, Mei | ||
Xu, Tingting | ||
Yang, Ruhai | ||
Zhao, Kai | ||
Zhao, Ran | ||
Zhou, Quanjie | ||
Zhu, Lei | ||
Date (W3CDTF): | 2025 | |
Date Issued (W3CDTF): | 2025-03-17 | |
Description: | *Introduction* The Xi’an Multi-Language Learner Corpus was developed by Xi'an International Studies University (XISU). It is comprised of 526 argumentative essays in 15 languages by Chinese L1 university students studying second languages, along with student metadata and writing prompts. It was developed to support second language learner research and to provide a database for cross-linguistic comparison of second languages. *Data* The essays were produced by undergraduate students at XISU and Yunnan Minzu University (YMU) in response to writing prompts prepared by the corpus development team. Data was collected in 2023 and 2024. Participating students were linguistic majors or studying one of the foreign languages available at XISU and YMU. Off-topic essays and incomplete texts were excluded All texts were cleaned and formatted. No changes were made to the texts in relation to grammatical tense or turn of phrase accuracy. Text and token counts by language are as follows: Language texts tokens Arabic 8 1,762 English 107 32,822 Filipino 10 1,371 French 129 39,944 German 78 10,941 Hindi 16 2,972 Indonesian 14 2,630 Korean 24 2,630 Malay 36 5,208 Persian 12 1,751 Russian 33 8,018 Swahili 10 1,840 Thai 12 1,661 Turkish 22 3,719 Urdu 15 3,645 LancsBox X 4.0 was used for counting Swahili, Persian, French, Urdu, and Hindi tokens. AntConc 4.2.4 was used for counting tokens in the other languages. The essays and writing prompts are stored in UTF-8 encoded plain text files. Metadata is presented in .csv files. *Samples* Sample text file (French) *Updates* None at this time | |
Extent: | Corpus size: 4735 KB | |
Identifier: | LDC2025T03 | |
https://catalog.ldc.upenn.edu/LDC2025T03 | ||
ISLRN: 615-404-265-320-6 | ||
DOI: r333-vr13 | ||
Language: | Arabic | |
Filipino | ||
English | ||
French | ||
German | ||
Hindi | ||
Indonesian | ||
Korean | ||
Malay (macrolanguage); Malay | ||
Persian | ||
Russian | ||
Swahili (macrolanguage); Swahili | ||
Thai | ||
Turkish | ||
Urdu | ||
Language (ISO639): | ara | |
fil | ||
eng | ||
fra | ||
deu | ||
hin | ||
ind | ||
kor | ||
msa | ||
fas | ||
rus | ||
swa | ||
tha | ||
tur | ||
urd | ||
License: | LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2025T03 | |
Rights Holder: | Portions © 2025 Xi’an International Studies University, © 2025 Trustees of the University of Pennsylvania | |
Type (DCMI): | Text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2025T03 | |
DateStamp: | 2025-03-17 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Zhang, Xiao; Zhang, Ling; Dang, Tian; Feng, Yuanzhao; Ji, Yujing; Jiang, Xiaohui; Kang, Zhewen; Lu, Yan; Nie, Wen; Ren, Hanyu; Wang, Canjun; Wang, Jiayi; Wang, Yu; Wu, Chen; Wu, Mei; Xu, Tingting; Yang, Ruhai; Zhao, Kai; Zhao, Ran; Zhou, Quanjie; Zhu, Lei. 2025. Linguistic Data Consortium. | |
Terms: | area_Asia area_Europe country_DE country_FR country_GB country_ID country_IN country_KR country_PH country_PK country_RU country_TH country_TR dcmi_Text iso639_ara iso639_deu iso639_eng iso639_fas iso639_fil iso639_fra iso639_hin iso639_ind iso639_kor iso639_msa iso639_rus iso639_swa iso639_tha iso639_tur iso639_urd |