OLAC Record oai:lindat.mff.cuni.cz:11372/LRT-2205 |
Metadata | ||
Title: | C4Corpus (CC BY-NC-ND part) | |
Bibliographic Citation: | http://hdl.handle.net/11372/LRT-2205 | |
Creator: | Gurevych, Iryna | |
Habernal, Ivan | ||
Zayed, Omnia | ||
Date (W3CDTF): | 2017-06-07T13:07:31Z | |
Date Available: | 2017-06-07T13:07:31Z | |
Description: | A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs. | |
Identifier (URI): | http://hdl.handle.net/11372/LRT-2205 | |
Language: | Afrikaans | |
Arabic | ||
Bengali | ||
Bulgarian | ||
Czech | ||
Danish | ||
German | ||
Modern Greek (1453-) | ||
English | ||
Estonian | ||
Persian | ||
Finnish | ||
French | ||
Gujarati | ||
Hebrew | ||
Hindi | ||
Croatian | ||
Hungarian | ||
Indonesian | ||
Italian | ||
Japanese | ||
Kannada | ||
Korean | ||
Latvian | ||
Lithuanian | ||
Malayalam | ||
Marathi | ||
Macedonian | ||
Nepali (macrolanguage) | ||
Dutch | ||
Norwegian | ||
Polish | ||
Portuguese | ||
Romanian | ||
Russian | ||
Slovak | ||
Slovenian | ||
Somali | ||
Spanish | ||
Albanian | ||
Swahili (macrolanguage) | ||
Swedish | ||
Tamil | ||
Telugu | ||
Tagalog | ||
Thai | ||
Turkish | ||
Ukrainian | ||
Undetermined | ||
Urdu | ||
Vietnamese | ||
Chinese | ||
Language (ISO639): | afr | |
ara | ||
ben | ||
bul | ||
ces | ||
dan | ||
deu | ||
ell | ||
eng | ||
est | ||
fas | ||
fin | ||
fra | ||
guj | ||
heb | ||
hin | ||
hrv | ||
hun | ||
ind | ||
ita | ||
jpn | ||
kan | ||
kor | ||
lav | ||
lit | ||
mal | ||
mar | ||
mkd | ||
nep | ||
nld | ||
nor | ||
pol | ||
por | ||
ron | ||
rus | ||
slk | ||
slv | ||
som | ||
spa | ||
sqi | ||
swa | ||
swe | ||
tam | ||
tel | ||
tgl | ||
tha | ||
tur | ||
ukr | ||
und | ||
urd | ||
vie | ||
zho | ||
Publisher: | Technische Universität Darmstadt | |
Rights: | Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) | |
http://creativecommons.org/licenses/by-nc-nd/4.0/ | ||
Subject: | CommonCrawl | |
Creative Commons | ||
Web corpus | ||
Amazon Web Services | ||
Type: | corpus | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:lindat.mff.cuni.cz:11372/LRT-2205 | |
DateStamp: | 2021-06-29 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Gurevych, Iryna; Habernal, Ivan; Zayed, Omnia. 2017. Technische Universität Darmstadt. | |
Terms: | area_Africa area_Asia area_Europe country_BD country_BG country_CZ country_DE country_DK country_ES country_FI country_FR country_GB country_GR country_HR country_HU country_ID country_IL country_IN country_IT country_JP country_KR country_LT country_MK country_NL country_NO country_PH country_PK country_PL country_PT country_RO country_RU country_SE country_SI country_SK country_SO country_TH country_TR country_UA country_VN country_ZA dcmi_Text iso639_afr iso639_ara iso639_ben iso639_bul iso639_ces iso639_dan iso639_deu iso639_ell iso639_eng iso639_est iso639_fas iso639_fin iso639_fra iso639_guj iso639_heb iso639_hin iso639_hrv iso639_hun iso639_ind iso639_ita iso639_jpn iso639_kan iso639_kor iso639_lav iso639_lit iso639_mal iso639_mar iso639_mkd iso639_nep iso639_nld iso639_nor iso639_pol iso639_por iso639_ron iso639_rus iso639_slk iso639_slv iso639_som iso639_spa iso639_sqi iso639_swa iso639_swe iso639_tam iso639_tel iso639_tgl iso639_tha iso639_tur iso639_ukr iso639_und iso639_urd iso639_vie iso639_zho olac_primary_text |