OLAC Record oai:lindat.mff.cuni.cz:11372/LRT-2209 |
Metadata | ||
Title: | C4Corpus (publicdomain part) | |
Bibliographic Citation: | http://hdl.handle.net/11372/LRT-2209 | |
Creator: | Gurevych, Iryna | |
Habernal, Ivan | ||
Zayed, Omnia | ||
Date (W3CDTF): | 2017-06-07T13:10:23Z | |
Date Available: | 2017-06-07T13:10:23Z | |
Description: | A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs. | |
Identifier (URI): | http://hdl.handle.net/11372/LRT-2209 | |
Language: | Afrikaans | |
Arabic | ||
Bulgarian | ||
Czech | ||
Danish | ||
German | ||
Modern Greek (1453-) | ||
English | ||
Estonian | ||
Persian | ||
Finnish | ||
French | ||
Croatian | ||
Hungarian | ||
Indonesian | ||
Italian | ||
Japanese | ||
Korean | ||
Latvian | ||
Lithuanian | ||
Dutch | ||
Norwegian | ||
Polish | ||
Portuguese | ||
Russian | ||
Slovenian | ||
Somali | ||
Spanish | ||
Swahili (macrolanguage) | ||
Swedish | ||
Tagalog | ||
Thai | ||
Turkish | ||
Ukrainian | ||
Undetermined | ||
Vietnamese | ||
Language (ISO639): | afr | |
ara | ||
bul | ||
ces | ||
dan | ||
deu | ||
ell | ||
eng | ||
est | ||
fas | ||
fin | ||
fra | ||
hrv | ||
hun | ||
ind | ||
ita | ||
jpn | ||
kor | ||
lav | ||
lit | ||
nld | ||
nor | ||
pol | ||
por | ||
rus | ||
slv | ||
som | ||
spa | ||
swa | ||
swe | ||
tgl | ||
tha | ||
tur | ||
ukr | ||
und | ||
vie | ||
Publisher: | Technische Universität Darmstadt | |
Rights: | Public Domain Mark (PD) | |
http://creativecommons.org/publicdomain/mark/1.0/ | ||
Subject: | CommonCrawl | |
Creative Commons | ||
Web corpus | ||
Amazon Web Services | ||
Type: | corpus | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:lindat.mff.cuni.cz:11372/LRT-2209 | |
DateStamp: | 2021-06-29 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Gurevych, Iryna; Habernal, Ivan; Zayed, Omnia. 2017. Technische Universität Darmstadt. | |
Terms: | area_Africa area_Asia area_Europe country_BG country_CZ country_DE country_DK country_ES country_FI country_FR country_GB country_GR country_HR country_HU country_ID country_IT country_JP country_KR country_LT country_NL country_NO country_PH country_PL country_PT country_RU country_SE country_SI country_SO country_TH country_TR country_UA country_VN country_ZA dcmi_Text iso639_afr iso639_ara iso639_bul iso639_ces iso639_dan iso639_deu iso639_ell iso639_eng iso639_est iso639_fas iso639_fin iso639_fra iso639_hrv iso639_hun iso639_ind iso639_ita iso639_jpn iso639_kor iso639_lav iso639_lit iso639_nld iso639_nor iso639_pol iso639_por iso639_rus iso639_slv iso639_som iso639_spa iso639_swa iso639_swe iso639_tgl iso639_tha iso639_tur iso639_ukr iso639_und iso639_vie olac_primary_text |