OLAC Record oai:lindat.mff.cuni.cz:11858/00-097C-0000-0022-6133-9 |
Metadata | ||
Title: | W2C – Web to Corpus – Corpora | |
Bibliographic Citation: | http://hdl.handle.net/11858/00-097C-0000-0022-6133-9 | |
Creator: | Majliš, Martin | |
Date (W3CDTF): | 2013-06-25T15:08:15Z | |
Date Available: | 2013-06-25T15:08:15Z | |
Description: | A set of corpora for 120 languages automatically collected from wikipedia and the web. Collected using the W2C toolset: http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1 | |
Identifier (URI): | http://hdl.handle.net/11858/00-097C-0000-0022-6133-9 | |
Language: | Afrikaans | |
Tosk Albanian | ||
Amharic | ||
Arabic | ||
Aragonese | ||
Egyptian Arabic | ||
Asturian | ||
Azerbaijani | ||
Belarusian | ||
Bengali | ||
Bosnian | ||
Bishnupriya | ||
Breton | ||
Buginese | ||
Bulgarian | ||
Catalan | ||
Cebuano | ||
Czech | ||
Chuvash | ||
Corsican | ||
Welsh | ||
Danish | ||
German | ||
Dimli (individual language) | ||
Modern Greek (1453-) | ||
English | ||
Esperanto | ||
Estonian | ||
Basque | ||
Faroese | ||
Persian | ||
Finnish | ||
French | ||
Western Frisian | ||
Gan Chinese | ||
Scottish Gaelic | ||
Irish | ||
Galician | ||
Gilaki | ||
Gujarati | ||
Haitian | ||
Serbo-Croatian | ||
Hebrew | ||
Fiji Hindi | ||
Hindi | ||
Croatian | ||
Upper Sorbian | ||
Hungarian | ||
Armenian | ||
Ido | ||
Interlingua (International Auxiliary Language Association) | ||
Indonesian | ||
Icelandic | ||
Italian | ||
Javanese | ||
Japanese | ||
Kannada | ||
Georgian | ||
Kazakh | ||
Korean | ||
Kurdish | ||
Latin | ||
Latvian | ||
Limburgan | ||
Lithuanian | ||
Lombard | ||
Luxembourgish | ||
Malayalam | ||
Marathi | ||
Macedonian | ||
Malagasy | ||
Mongolian | ||
Maori | ||
Malay (macrolanguage) | ||
Burmese | ||
Neapolitan | ||
Low German | ||
Nepali (macrolanguage) | ||
Newari | ||
Dutch | ||
Norwegian Nynorsk | ||
Norwegian | ||
Occitan (post 1500) | ||
Ossetian | ||
Pampanga | ||
Piemontese | ||
Polish | ||
Portuguese | ||
Quechua | ||
Romanian | ||
Russian | ||
Yakut | ||
Sicilian | ||
Scots | ||
Slovak | ||
Slovenian | ||
Spanish | ||
Albanian | ||
Serbian | ||
Sundanese | ||
Swahili (macrolanguage) | ||
Swedish | ||
Tamil | ||
Tatar | ||
Telugu | ||
Tajik | ||
Tagalog | ||
Thai | ||
Turkish | ||
Ukrainian | ||
Urdu | ||
Uzbek | ||
Venetian | ||
Vietnamese | ||
Volapük | ||
Waray (Philippines) | ||
Walloon | ||
Yiddish | ||
Yoruba | ||
Chinese | ||
Language (ISO639): | afr | |
als | ||
amh | ||
ara | ||
arg | ||
arz | ||
ast | ||
aze | ||
bel | ||
ben | ||
bos | ||
bpy | ||
bre | ||
bug | ||
bul | ||
cat | ||
ceb | ||
ces | ||
chv | ||
cos | ||
cym | ||
dan | ||
deu | ||
diq | ||
ell | ||
eng | ||
epo | ||
est | ||
eus | ||
fao | ||
fas | ||
fin | ||
fra | ||
fry | ||
gan | ||
gla | ||
gle | ||
glg | ||
glk | ||
guj | ||
hat | ||
hbs | ||
heb | ||
hif | ||
hin | ||
hrv | ||
hsb | ||
hun | ||
hye | ||
ido | ||
ina | ||
ind | ||
isl | ||
ita | ||
jav | ||
jpn | ||
kan | ||
kat | ||
kaz | ||
kor | ||
kur | ||
lat | ||
lav | ||
lim | ||
lit | ||
lmo | ||
ltz | ||
mal | ||
mar | ||
mkd | ||
mlg | ||
mon | ||
mri | ||
msa | ||
mya | ||
nap | ||
nds | ||
nep | ||
new | ||
nld | ||
nno | ||
nor | ||
oci | ||
oss | ||
pam | ||
pms | ||
pol | ||
por | ||
que | ||
ron | ||
rus | ||
sah | ||
scn | ||
sco | ||
slk | ||
slv | ||
spa | ||
sqi | ||
srp | ||
sun | ||
swa | ||
swe | ||
tam | ||
tat | ||
tel | ||
tgk | ||
tgl | ||
tha | ||
tur | ||
ukr | ||
urd | ||
uzb | ||
vec | ||
vie | ||
vol | ||
war | ||
wln | ||
yid | ||
yor | ||
zho | ||
Publisher: | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) | |
Rights: | Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) | |
http://creativecommons.org/licenses/by-sa/3.0/ | ||
Subject: | multilingual corpora | |
Type: | corpus | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:lindat.mff.cuni.cz:11858/00-097C-0000-0022-6133-9 | |
DateStamp: | 2021-06-29 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Majliš, Martin. 2013. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL). | |
Terms: | area_Africa area_Americas area_Asia area_Europe area_Pacific country_AL country_AM country_BA country_BD country_BE country_BG country_BY country_CN country_CZ country_DE country_DK country_EG country_ES country_ET country_FI country_FJ country_FR country_GB country_GE country_GR country_HR country_HT country_HU country_ID country_IE country_IL country_IN country_IR country_IS country_IT country_JP country_KR country_KZ country_LT country_LU country_MK country_MM country_NG country_NL country_NO country_NP country_NZ country_PH country_PK country_PL country_PT country_RO country_RS country_RU country_SE country_SI country_SK country_TH country_TJ country_TR country_UA country_VA country_VN country_ZA dcmi_Text iso639_afr iso639_als iso639_amh iso639_ara iso639_arg iso639_arz iso639_ast iso639_aze iso639_bel iso639_ben iso639_bos iso639_bpy iso639_bre iso639_bug iso639_bul iso639_cat iso639_ceb iso639_ces iso639_chv iso639_cos iso639_cym iso639_dan iso639_deu iso639_diq iso639_ell iso639_eng iso639_epo iso639_est iso639_eus iso639_fao iso639_fas iso639_fin iso639_fra iso639_fry iso639_gan iso639_gla iso639_gle iso639_glg iso639_glk iso639_guj iso639_hat iso639_hbs iso639_heb iso639_hif iso639_hin iso639_hrv iso639_hsb iso639_hun iso639_hye iso639_ido iso639_ina iso639_ind iso639_isl iso639_ita iso639_jav iso639_jpn iso639_kan iso639_kat iso639_kaz iso639_kor iso639_kur iso639_lat iso639_lav iso639_lim iso639_lit iso639_lmo iso639_ltz iso639_mal iso639_mar iso639_mkd iso639_mlg iso639_mon iso639_mri iso639_msa iso639_mya iso639_nap iso639_nds iso639_nep iso639_new iso639_nld iso639_nno iso639_nor iso639_oci iso639_oss iso639_pam iso639_pms iso639_pol iso639_por iso639_que iso639_ron iso639_rus iso639_sah iso639_scn iso639_sco iso639_slk iso639_slv iso639_spa iso639_sqi iso639_srp iso639_sun iso639_swa iso639_swe iso639_tam iso639_tat iso639_tel iso639_tgk iso639_tgl iso639_tha iso639_tur iso639_ukr iso639_urd iso639_uzb iso639_vec iso639_vie iso639_vol iso639_war iso639_wln iso639_yid iso639_yor iso639_zho olac_primary_text |