OLAC Record
oai:lindat.mff.cuni.cz:11372/LRT-5124

Metadata
Title:PARSEME corpora annotated for verbal multiword expressions (version 1.3)
Bibliographic Citation:http://hdl.handle.net/11372/LRT-5124
Creator:Savary, Agata
Ramisch, Carlos
Guillaume, Bruno
Hawwari, Abdelati
Walsh, Abigail
Fotopoulou, Aggeliki
Bielinskienė, Agnė
Estarrona, Ainara
Gatt, Albert
Butler, Alexandra
Rademaker, Alexandre
Maldonado, Alfredo
Villavicencio, Aline
Farrugia, Alison
Muscat, Amanda
Gatt, Anabelle
Antić, Anđela
De Santis, Anna
Raffone, Annalisa
Riccio, Anna
Pascucci, Antonio
Gurrutxaga, Antton
Bhatia, Archna
Vaidya, Ashwini
Miral, Ayşenur
QasemiZadeh, Behrang
Priego Sanchez, Belem
Griciūtė, Bernadeta
Erden, Berna
Parra Escartín, Carla
Herrero, Carlos
Carlino, Carola
Pasquer, Caroline
Liebeskind, Chaya
Wang, Chenweng
Ben Khelil, Chérifa
Bonial, Claire
Somers, Clarissa
Aceta, Cristina
Krstev, Cvetana
Bejček, Eduard
Lindqvist, Ellinor
Erenmalm, Elsa
Palka-Binkiewicz, Emilia
Rimkute, Erika
Petterson, Eva
Cap, Fabienne
Hu, Fangyuan
Sangati, Federico
Wick Pedro, Gabriela
Speranza, Giulia
Jagfeld, Glorianna
Blagus, Goranka
Berk, Gözde
Attard, Greta
Eryiğit, Gülşen
Finnveden, Gustav
Martínez Alonso, Héctor
de Medeiros Caseli, Helena
Elyovich, Hevi
Xu, Hongzhi
Xiao, Huangyang
Miranda, Isaac
Jaknić, Isidora
El Maarouf, Ismail
Aduriz, Itziar
Gonzalez, Itziar
Matas, Ivana
Stoyanova, Ivelina
Jazbec, Ivo-Pavao
Busuttil, Jael
Waszczuk, Jakub
Findlay, Jamie
Bonnici, Janice
Šnajder, Jan
Antoine, Jean-Yves
Foster, Jennifer
Chen, Jia
Nivre, Joakim
Monti, Johanna
McCrae, John
Kovalevskaitė, Jolanta
Jain, Kanishka
Simkó, Katalin
Yu, Ke
Azzopardi, Kirsty
Adalı, Kübra
Uria, Larraitz
Zilio, Leonardo
Boizou, Loïc
van der Plas, Lonneke
Galea, Luke
Sarlak, Mahtab
Buljan, Maja
Cherchi, Manuela
Tanti, Marc
Di Buono, Maria Pia
Todorova, Maria
Candito, Marie
Constant, Matthieu
Shamsfard, Mehrnoush
Jiang, Menghan
Boz, Mert
Spagnol, Michael
Onofrei, Mihaela
Li, Minli
Elbadrashiny, Mohamed
Diab, Mona
Rizea, Monica-Mihaela
Hadj Mohamed, Najet
Theoxari, Natasa
Schneider, Nathan
Tabone, Nicole
Ljubešić, Nikola
Vale, Oto
Cook, Paul
Yan, Peiyi
Gantar, Polona
Ehren, Rafael
Fabri, Ray
Ibrahim, Rehab
Ramisch, Renata
Walles, Rinat
Wilkens, Rodrigo
Urizar, Ruben
Sun, Ruilong
Malka, Ruth
Galea, Sara Anne
Stymne, Sara
Louizou, Sevasti
Hu, Sha
Taslimipoor, Shiva
Ratori, Shraddha
Srivastava, Shubham
Cordeiro, Silvio Ricardo
Krek, Simon
Liu, Siyuan
Zeng, Si
Yu, Songping
Arhar Holdt, Špela
Markantonatou, Stella
Papadelli, Stella
Leseva, Svetlozara
Kuzman, Taja
Kavčič, Teja
Lynn, Teresa
Lichte, Timm
Pickard, Thomas
Dimitrova, Tsvetana
Yih, Tsy
Güngör, Tunga
Dinç, Tutkum
Iñurrieta, Uxoa
Tajalli, Vahide
Stefanova, Valentina
Caruso, Valeria
Puri, Vandana
Foufi, Vassiliki
Barbu Mititelu, Verginica
Vincze, Veronika
Kovács, Viktória
Shukla, Vishakha
Giouli, Voula
Ge, Xiaomin
Ha-Cohen Kerner, Yaakov
Öztürk, Yağmur
Yarandi, Yalda
Parmentier, Yannick
Zhang, Yongchen
Zhao, Yun
Urešová, Zdeňka
Yirmibeşoğlu, Zeynep
Qin, Zhenzhen
Stank
Cristescu, Mihaela
Zgreabăn, Bianca-Mădălina
Bărbulescu, Elena-Andreea
Stanković, Ranka
Date (W3CDTF):2023-05-10T11:36:56Z
Date Available:2023-05-10T11:36:56Z
Description:This multilingual resource contains corpora in which verbal MWEs have been manually annotated. VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). This is the first release of the corpora without an associated shared task. Previous version (1.2) was associated with the PARSEME Shared Task on semi-supervised Identification of Verbal MWEs (2020). The data covers 26 languages corresponding to the combination of the corpora for all previous three editions (1.0, 1.1 and 1.2) of the corpora. VMWEs were annotated according to the universal guidelines. The corpora are provided in the cupt format, inspired by the CONLL-U format. Morphological and syntactic information, ­­­­including parts of speech, lemmas, morphological features and/or syntactic dependencies, are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). All corpora are split into training, development and test data, following the splitting strategy adopted for the PARSEME Shared Task 1.2. The annotation guidelines are available online: https://parsemefr.lis-lab.fr/parseme-st-guidelines/1.3 The .cupt format is detailed here: https://multiword.sourceforge.net/cupt-format/
Identifier (URI):http://hdl.handle.net/11372/LRT-5124
Language:Arabic
Bulgarian
Czech
German
Modern Greek (1453-)
English
Spanish
Basque
Persian
French
Irish
Hebrew
Hindi
Croatian
Hungarian
Lithuanian
Italian
Maltese
Polish
Portuguese
Romanian
Slovenian
Serbian
Swedish
Turkish
Chinese
Language (ISO639):ara
bul
ces
deu
ell
eng
spa
eus
fas
fra
gle
heb
hin
hrv
hun
lit
ita
mlt
pol
por
ron
slv
srp
swe
tur
zho
Publisher:PARSEME
Replaces (URI):http://hdl.handle.net/11234/1-3367
Rights:PARSEME Corpora v. 1.3 - Licence Agreement
https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-1.3
Subject:multiword expressions
verbal multiword expressions
light verb construction
verb-particle constructions
inherently reflexive verbs
verbal idioms
multi-verb constructions
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11372/LRT-5124
DateStamp:  2023-05-10
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Savary, Agata; Ramisch, Carlos; Guillaume, Bruno; Hawwari, Abdelati; Walsh, Abigail; Fotopoulou, Aggeliki; Bielinskienė, Agnė; Estarrona, Ainara; Gatt, Albert; Butler, Alexandra; Rademaker, Alexandre; Maldonado, Alfredo; Villavicencio, Aline; Farrugia, Alison; Muscat, Amanda; Gatt, Anabelle; Antić, Anđela; De Santis, Anna; Raffone, Annalisa; Riccio, Anna; Pascucci, Antonio; Gurrutxaga, Antton; Bhatia, Archna; Vaidya, Ashwini; Miral, Ayşenur; QasemiZadeh, Behrang; Priego Sanchez, Belem; Griciūtė, Bernadeta; Erden, Berna; Parra Escartín, Carla; Herrero, Carlos; Carlino, Carola; Pasquer, Caroline; Liebeskind, Chaya; Wang, Chenweng; Ben Khelil, Chérifa; Bonial, Claire; Somers, Clarissa; Aceta, Cristina; Krstev, Cvetana; Bejček, Eduard; Lindqvist, Ellinor; Erenmalm, Elsa; Palka-Binkiewicz, Emilia; Rimkute, Erika; Petterson, Eva; Cap, Fabienne; Hu, Fangyuan; Sangati, Federico; Wick Pedro, Gabriela; Speranza, Giulia; Jagfeld, Glorianna; Blagus, Goranka; Berk, Gözde; Attard, Greta; Eryiğit, Gülşen; Finnveden, Gustav; Martínez Alonso, Héctor; de Medeiros Caseli, Helena; Elyovich, Hevi; Xu, Hongzhi; Xiao, Huangyang; Miranda, Isaac; Jaknić, Isidora; El Maarouf, Ismail; Aduriz, Itziar; Gonzalez, Itziar; Matas, Ivana; Stoyanova, Ivelina; Jazbec, Ivo-Pavao; Busuttil, Jael; Waszczuk, Jakub; Findlay, Jamie; Bonnici, Janice; Šnajder, Jan; Antoine, Jean-Yves; Foster, Jennifer; Chen, Jia; Nivre, Joakim; Monti, Johanna; McCrae, John; Kovalevskaitė, Jolanta; Jain, Kanishka; Simkó, Katalin; Yu, Ke; Azzopardi, Kirsty; Adalı, Kübra; Uria, Larraitz; Zilio, Leonardo; Boizou, Loïc; van der Plas, Lonneke; Galea, Luke; Sarlak, Mahtab; Buljan, Maja; Cherchi, Manuela; Tanti, Marc; Di Buono, Maria Pia; Todorova, Maria; Candito, Marie; Constant, Matthieu; Shamsfard, Mehrnoush; Jiang, Menghan; Boz, Mert; Spagnol, Michael; Onofrei, Mihaela; Li, Minli; Elbadrashiny, Mohamed; Diab, Mona; Rizea, Monica-Mihaela; Hadj Mohamed, Najet; Theoxari, Natasa; Schneider, Nathan; Tabone, Nicole; Ljubešić, Nikola; Vale, Oto; Cook, Paul; Yan, Peiyi; Gantar, Polona; Ehren, Rafael; Fabri, Ray; Ibrahim, Rehab; Ramisch, Renata; Walles, Rinat; Wilkens, Rodrigo; Urizar, Ruben; Sun, Ruilong; Malka, Ruth; Galea, Sara Anne; Stymne, Sara; Louizou, Sevasti; Hu, Sha; Taslimipoor, Shiva; Ratori, Shraddha; Srivastava, Shubham; Cordeiro, Silvio Ricardo; Krek, Simon; Liu, Siyuan; Zeng, Si; Yu, Songping; Arhar Holdt, Špela; Markantonatou, Stella; Papadelli, Stella; Leseva, Svetlozara; Kuzman, Taja; Kavčič, Teja; Lynn, Teresa; Lichte, Timm; Pickard, Thomas; Dimitrova, Tsvetana; Yih, Tsy; Güngör, Tunga; Dinç, Tutkum; Iñurrieta, Uxoa; Tajalli, Vahide; Stefanova, Valentina; Caruso, Valeria; Puri, Vandana; Foufi, Vassiliki; Barbu Mititelu, Verginica; Vincze, Veronika; Kovács, Viktória; Shukla, Vishakha; Giouli, Voula; Ge, Xiaomin; Ha-Cohen Kerner, Yaakov; Öztürk, Yağmur; Yarandi, Yalda; Parmentier, Yannick; Zhang, Yongchen; Zhao, Yun; Urešová, Zdeňka; Yirmibeşoğlu, Zeynep; Qin, Zhenzhen; Stank; Cristescu, Mihaela; Zgreabăn, Bianca-Mădălina; Bărbulescu, Elena-Andreea; Stanković, Ranka. 2023. PARSEME.
Terms: area_Asia area_Europe country_BG country_CZ country_DE country_ES country_FR country_GB country_GR country_HR country_HU country_IE country_IL country_IN country_IT country_LT country_MT country_PL country_PT country_RO country_RS country_SE country_SI country_TR dcmi_Text iso639_ara iso639_bul iso639_ces iso639_deu iso639_ell iso639_eng iso639_eus iso639_fas iso639_fra iso639_gle iso639_heb iso639_hin iso639_hrv iso639_hun iso639_ita iso639_lit iso639_mlt iso639_pol iso639_por iso639_ron iso639_slv iso639_spa iso639_srp iso639_swe iso639_tur iso639_zho olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11372/LRT-5124
Up-to-date as of: Thu Oct 5 0:43:34 EDT 2023