OLAC Record oai:catalogue.elra.info:ELRA-W0085 |
Metadata | ||
Title: | ROCO Romanian journalistic corpus | |
Access Rights: | Rights available for: nonCommercialUse, commercialUse | |
Date Available (W3CDTF): | 2015-11-30 | |
Date Issued (W3CDTF): | 2015-11-30 | |
Date Modified (W3CDTF): | 2015-11-30 | |
Description: | ROCO is a Romanian journalistic corpus containing approximately 7.1 million tokens, the number of types being 231,626. It is rich in proper names, numerals and named entities. The corpus contains morphosyntactic information (MSD annotations) which has been assigned automatically with the high accuracy (estimated 98%) TTL tagger implementing the tiered tagging methodology. About 20% of the MSD annotations have been manually checked, validated and, where the case, corrected. MSDs follow the Multext-East specifications. For Romanian there are 614 different MSDs. They have been slightly modified (new tags for named entities have been added).The corpus was first segmented, then PoS annotated and lemmatized with the TTL processing chain. The corpus has been XML encoded and each file includes metadata (cesHeader). | |
Identifier: | ELRA-W0085 | |
ISLRN: 312-617-089-348-7 | ||
Identifier (URI): | https://catalog.elra.info/en-us/repository/browse/ELRA-W0085/ | |
Language: | Romanian; Moldavian; Moldovan | |
Language (ISO639): | ron | |
Medium: | Not specified | |
Publisher: | ELRA (European Language Resources Association) | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | ELRA Catalogue of Language Resources | |
Description: | http://www.language-archives.org/archive/catalogue.elra.info | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:catalogue.elra.info:ELRA-W0085 | |
DateStamp: | 2015-11-30 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | n.a. 2015. ELRA (European Language Resources Association). | |
Terms: | area_Europe country_RO dcmi_Text iso639_ron olac_primary_text |