OLAC Record oai:catalogue.elra.info:ELRA-W0073 |
Metadata | ||
Title: | Quaero Old Press Extended Named Entity corpus | |
Access Rights: | Rights available for: nonCommercialUse, commercialUse | |
Date Available (W3CDTF): | 2013-02-13 | |
Date Issued (W3CDTF): | 2013-02-13 | |
Date Modified (W3CDTF): | 2017-02-09 | |
Description: | The Quaero Old Press Extended Named Entity corpus consists of the manual annotation of 76 newspaper issues published in 1890-1891 and provided by the French National Library (Bibliothèque Nationale de France). Three different titles are used (Le Temps, La Croix and Le Figaro) for a total of 295 pages.The corpus is fully manually annotated according to the Quaero extended and structured named entity definition, which differentiates entity "types" and "components". The training part of the corpus is composed of 231 pages and contains 1,297,742 words, 114,599 types and 136,113 components. The test corpus is composed of 64 pages and contains 363,455 words, 33,083 types and 40,432 components.The Quaero Old Press Extended Named Entity Corpus consists of:- 76 newspaper issues published in 1890-1891 and provided by the French National Library (Biblioth\`eque Nationale de France) (images and OCR output),- 295 extracted pages in text format along with the corresponding images,- the fully annotated txt corpus amounts to about 1,3 million words,- a sub-corpus serving as a mini-reference corpus for quality evaluation purposes,- tools developed for the extraction of text and images, for annotation and for evaluation,- guidelines. | |
Identifier: | ELRA-W0073 | |
ISLRN: 864-217-681-552-4 | ||
Identifier (URI): | https://catalog.elra.info/en-us/repository/browse/ELRA-W0073/ | |
Language: | French | |
Language (ISO639): | fra | |
Medium: | Not specified | |
Publisher: | ELRA (European Language Resources Association) | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | ELRA Catalogue of Language Resources | |
Description: | http://www.language-archives.org/archive/catalogue.elra.info | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:catalogue.elra.info:ELRA-W0073 | |
DateStamp: | 2013-02-13 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | n.a. 2013. ELRA (European Language Resources Association). | |
Terms: | area_Europe country_FR dcmi_Text iso639_fra olac_primary_text |