OLAC Record oai:lindat.mff.cuni.cz:11234/1-1743 |
Metadata | ||
Title: | Deltacorpus 1.1 | |
Bibliographic Citation: | http://hdl.handle.net/11234/1-1743 | |
Creator: | Mareček, David | |
Yu, Zhiwei | ||
Zeman, Daniel | ||
Žabokrtský, Zdeněk | ||
Date (W3CDTF): | 2016-06-27T12:27:25Z | |
Date Available: | 2016-06-27T12:27:25Z | |
Description: | Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia). Changes in version 1.1: 1. Universal Dependencies tagset instead of the older and smaller Google Universal POS tagset. 2. SVM classifier trained on Universal Dependencies 1.2 instead of HamleDT 2.0. 3. Balto-Slavic languages, Germanic languages and Romance languages were tagged by classifier trained only on the respective group of languages. Other languages were tagged by a classifier trained on all available languages. The "c7" combination from version 1.0 is no longer used. | |
Identifier (URI): | http://hdl.handle.net/11234/1-1743 | |
Language: | Belarusian | |
Bosnian | ||
Bulgarian | ||
Czech | ||
Serbo-Croatian | ||
Croatian | ||
Upper Sorbian | ||
Macedonian | ||
Polish | ||
Russian | ||
Slovak | ||
Slovenian | ||
Serbian | ||
Ukrainian | ||
Latvian | ||
Lithuanian | ||
Afrikaans | ||
Danish | ||
German | ||
English | ||
Faroese | ||
Western Frisian | ||
Swiss German | ||
Icelandic | ||
Limburgan | ||
Luxembourgish | ||
Low German | ||
Dutch | ||
Norwegian Nynorsk | ||
Norwegian | ||
Scots | ||
Swedish | ||
Yiddish | ||
Aragonese | ||
Asturian | ||
Catalan | ||
French | ||
Galician | ||
Haitian | ||
Italian | ||
Latin | ||
Lombard | ||
Neapolitan | ||
Piemontese | ||
Portuguese | ||
Romanian | ||
Spanish | ||
Venetian | ||
Walloon | ||
Breton | ||
Welsh | ||
Scottish Gaelic | ||
Irish | ||
Modern Greek (1453-) | ||
Armenian | ||
Albanian | ||
Dimli (individual language) | ||
Persian | ||
Gilaki | ||
Kurdish | ||
Tajik | ||
Bengali | ||
Bishnupriya | ||
Gujarati | ||
Fiji Hindi | ||
Hindi | ||
Marathi | ||
Nepali (macrolanguage) | ||
Urdu | ||
Amharic | ||
Arabic | ||
Egyptian Arabic | ||
Hebrew | ||
Estonian | ||
Finnish | ||
Hungarian | ||
Basque | ||
Georgian | ||
Chuvash | ||
Azerbaijani | ||
Turkish | ||
Uzbek | ||
Kazakh | ||
Tatar | ||
Yakut | ||
Korean | ||
Mongolian | ||
Telugu | ||
Kannada | ||
Malayalam | ||
Tamil | ||
Newari | ||
Vietnamese | ||
Indonesian | ||
Javanese | ||
Malagasy | ||
Maori | ||
Malay (macrolanguage) | ||
Pampanga | ||
Sundanese | ||
Tagalog | ||
Waray (Philippines) | ||
Swahili (macrolanguage) | ||
Esperanto | ||
Ido | ||
Interlingua (International Auxiliary Language Association) | ||
Volapük | ||
Language (ISO639): | bel | |
bos | ||
bul | ||
ces | ||
hbs | ||
hrv | ||
hsb | ||
mkd | ||
pol | ||
rus | ||
slk | ||
slv | ||
srp | ||
ukr | ||
lav | ||
lit | ||
afr | ||
dan | ||
deu | ||
eng | ||
fao | ||
fry | ||
gsw | ||
isl | ||
lim | ||
ltz | ||
nds | ||
nld | ||
nno | ||
nor | ||
sco | ||
swe | ||
yid | ||
arg | ||
ast | ||
cat | ||
fra | ||
glg | ||
hat | ||
ita | ||
lat | ||
lmo | ||
nap | ||
pms | ||
por | ||
ron | ||
spa | ||
vec | ||
wln | ||
bre | ||
cym | ||
gla | ||
gle | ||
ell | ||
hye | ||
sqi | ||
diq | ||
fas | ||
glk | ||
kur | ||
tgk | ||
ben | ||
bpy | ||
guj | ||
hif | ||
hin | ||
mar | ||
nep | ||
urd | ||
amh | ||
ara | ||
arz | ||
heb | ||
est | ||
fin | ||
hun | ||
eus | ||
kat | ||
chv | ||
aze | ||
tur | ||
uzb | ||
kaz | ||
tat | ||
sah | ||
kor | ||
mon | ||
tel | ||
kan | ||
mal | ||
tam | ||
new | ||
vie | ||
ind | ||
jav | ||
mlg | ||
mri | ||
msa | ||
pam | ||
sun | ||
tgl | ||
war | ||
swa | ||
epo | ||
ido | ||
ina | ||
vol | ||
Publisher: | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) | |
Replaces (URI): | http://hdl.handle.net/11234/1-1662 | |
Rights: | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) | |
http://creativecommons.org/licenses/by-sa/4.0/ | ||
Subject: | part of speech | |
tagging | ||
semi-supervised | ||
cross-language | ||
Type: | corpus | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:lindat.mff.cuni.cz:11234/1-1743 | |
DateStamp: | 2021-06-29 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Mareček, David; Yu, Zhiwei; Zeman, Daniel; Žabokrtský, Zdeněk. 2016. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL). | |
Terms: | area_Africa area_Americas area_Asia area_Europe area_Pacific country_AM country_BA country_BD country_BE country_BG country_BY country_CH country_CZ country_DE country_DK country_EG country_ES country_ET country_FI country_FJ country_FR country_GB country_GE country_GR country_HR country_HT country_HU country_ID country_IE country_IL country_IN country_IR country_IS country_IT country_KR country_KZ country_LT country_LU country_MK country_NL country_NO country_NP country_NZ country_PH country_PK country_PL country_PT country_RO country_RS country_RU country_SE country_SI country_SK country_TJ country_TR country_UA country_VA country_VN country_ZA dcmi_Text iso639_afr iso639_amh iso639_ara iso639_arg iso639_arz iso639_ast iso639_aze iso639_bel iso639_ben iso639_bos iso639_bpy iso639_bre iso639_bul iso639_cat iso639_ces iso639_chv iso639_cym iso639_dan iso639_deu iso639_diq iso639_ell iso639_eng iso639_epo iso639_est iso639_eus iso639_fao iso639_fas iso639_fin iso639_fra iso639_fry iso639_gla iso639_gle iso639_glg iso639_glk iso639_gsw iso639_guj iso639_hat iso639_hbs iso639_heb iso639_hif iso639_hin iso639_hrv iso639_hsb iso639_hun iso639_hye iso639_ido iso639_ina iso639_ind iso639_isl iso639_ita iso639_jav iso639_kan iso639_kat iso639_kaz iso639_kor iso639_kur iso639_lat iso639_lav iso639_lim iso639_lit iso639_lmo iso639_ltz iso639_mal iso639_mar iso639_mkd iso639_mlg iso639_mon iso639_mri iso639_msa iso639_nap iso639_nds iso639_nep iso639_new iso639_nld iso639_nno iso639_nor iso639_pam iso639_pms iso639_pol iso639_por iso639_ron iso639_rus iso639_sah iso639_sco iso639_slk iso639_slv iso639_spa iso639_sqi iso639_srp iso639_sun iso639_swa iso639_swe iso639_tam iso639_tat iso639_tel iso639_tgk iso639_tgl iso639_tur iso639_ukr iso639_urd iso639_uzb iso639_vec iso639_vie iso639_vol iso639_war iso639_wln iso639_yid olac_primary_text |