OLAC Record oai:catalogue.elra.info:ELRA-W0049 |
Metadata | ||
Title: | "Le Monde Diplomatique" Arabic tagged corpus | |
Access Rights: | Rights available for: nonCommercialUse, commercialUse | |
Date Available (W3CDTF): | 2009-03-31 | |
Date Issued (W3CDTF): | 2009-03-31 | |
Date Modified (W3CDTF): | 2009-03-31 | |
Description: | This corpus contains 102,960 vowelised, lemmatised and tagged words (58 texts from Le Monde Diplomatique Arabic, see also ELRA-W0036-04). To each text are associated 3 files :-raw text in Arabic,-vowelized text in Arabic,-one XML file containing the morphological annotation of the text. Each text word associates a certain number of information, such as word size, rank of the word in the text, paragraph number where the word was found, etc. Each word associates a node in the XML file. Each node contains the following positional features of the word in the text:-Paragraph number in the text, i.e. paragraph where the word can be found,-Sentence number in the paragraph,-Sentence number in the text,-Rank of the word in the text,-Rank of the first character of the word in the text,-Word size.Information about word annotation are added as « sub-nodes »:-Word of non vowelised text,-Vowelised word,-Word lemma,-Grammatical category of the word. | |
Identifier: | ELRA-W0049 | |
ISLRN: 124-139-628-259-2 | ||
Identifier (URI): | https://catalog.elra.info/en-us/repository/browse/ELRA-W0049/ | |
Language: | Arabic | |
Language (ISO639): | ara | |
Medium: | Not specified | |
Publisher: | ELRA (European Language Resources Association) | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | ELRA Catalogue of Language Resources | |
Description: | http://www.language-archives.org/archive/catalogue.elra.info | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:catalogue.elra.info:ELRA-W0049 | |
DateStamp: | 2009-03-31 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | n.a. 2009. ELRA (European Language Resources Association). | |
Terms: | dcmi_Text iso639_ara olac_primary_text |