OLAC Record oai:catalogue.elra.info:ELRA-W0127 |
Metadata | ||
Title: | Normalized Arabic Fragments for Inestimable Stemming (NAFIS) | |
Access Rights: | Rights available for: nonCommercialUse, commercialUse | |
Date Available (W3CDTF): | 2018-10-02 | |
Date Issued (W3CDTF): | 2018-10-02 | |
Description: | Normalized Arabic Fragments for Inestimable Stemming (NAFIS) is an Arabic stemming gold standard corpus composed by a collection of sentences, selected to be representative of Arabic stemming tasks and manually annotated. Indeed, NAFIS is:Comprehensive: The content of NAFIS can be generalized to the Arabic language as a whole. Within the stemming issue, to be comprehensive the corpus must contain all possible affix combinations. To reflect this purpose, linguists made an inventory of all Arabic affix combinations. An affix is a prefix-suffix couple that can be agglutinated to a specific word type (noun, verb or particle). Arabic affixes consist of 12 atomic prefixes and 11 atomic suffixes. Their combining generates about 94 prefixes and 73 suffixes (we note that we use the terms affix, prefix and suffix instead of clitic, proclitic and enclitic because they are widely used in the literature). For example the prefix “وَال” (and the) is composed with two atomic prefixes “وَ” (the conjunction “and”) and “لا” (the definite article “the”). Compiled: linguists gathered a set of sentences containing all earlier listed affixes to ensure the comprehensiveness criterion. Compiled sentences belong to various sources (poems, holy Quran, books, and periodics) of diversified kinds (proverb and dictum, article commentary, religious text, literature, historical fiction). For instance, the following sentence "عليكم بالجد فإنه أساس النجاح" is part of the corpus and contains four affixes combination: 1.[-كم]: the empty prefix associated with the suffix pronoun ‘you’, 2.[بال-]: composed with two atomic prefixes ("ب" the preposition 'with' and “ال” the definite article 'the') and the empty suffix, 3.[ه-ف]: composed with the prefix “ف” (the conjunction “then”) and the suffix “ه” (the pronoun “his”) 4.[ال-]: composed with “ال” the definite article 'the' and the empty suffix.As shown in the extract below, NAFIS is represented according to the TEI standard. Sentences are enclosed within the | |
Identifier: | ELRA-W0127 | |
ISLRN: 305-450-745-774-1 | ||
Identifier (URI): | https://catalog.elra.info/en-us/repository/browse/ELRA-W0127/ | |
Language: | Arabic | |
Language (ISO639): | ara | |
Medium: | downloadable | |
Publisher: | ELRA (European Language Resources Association) | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | ELRA Catalogue of Language Resources | |
Description: | http://www.language-archives.org/archive/catalogue.elra.info | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:catalogue.elra.info:ELRA-W0127 | |
DateStamp: | 2018-10-02 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | n.a. 2018. ELRA (European Language Resources Association). | |
Terms: | dcmi_Text iso639_ara olac_primary_text |