OLAC Record oai:catalogue.elra.info:ELRA-W0119 |
Metadata | ||
Title: | Helsinki Corpus of Swahili | |
Access Rights: | Rights available for: commercialUse | |
Date Available (W3CDTF): | 2017-07-12 | |
Date Issued (W3CDTF): | 2017-07-12 | |
Date Modified (W3CDTF): | 2017-07-12 | |
Description: | This is a text corpus of Swahili language of 25 million words, annotated for part-of-speech, morphology and syntax. The corpus contains prose text from fiction, news media and government documents domains, from the period between 1953 and 2016.This package contains:-the Helsinki Corpus of Swahili 2.0 Non Annotated Version, which contains the raw material formatted and corrected. -the Helsinki Corpus of Swahili 2.0 Annotated version, annotated with Salama Tagger and with metadata added to each file. The source texts were collected from the Web (texts in news media between 1988-2016 and open government webpages between 2004 and 2006) and from books (between 1953 and 1991, scanned and proofread). Part of the oldest news material before the time of scanners was manually typed. Old material contains material collected before 2003: Books and News New material contains a section Bunge (Hansards of the Tanzanian Parliament from the years 2004, 2005 and 2006) and a section News (from 2004-2015).A word in the annotated corpus contains normally the following types of information: token, stem, part-of-speech, morphological description, syntactic tag, rest of verb description.The corpus was prepared at the University of Helsinki, Department of Asian and African Studies under auspices of Prof. Arvi Hurskainen. It is available from ELRA for commercial use only. For academic use, it is accessible via Kielipankki - the Language Bank of Finland in Korp (https://korp.csc.fi/).A corpus version with English glosses, where each word in corpus is provided with one or more lexical equivalents, can be distributed upon demand (terms to be discussed on a case by case basis). | |
Identifier: | ELRA-W0119 | |
ISLRN: 941-187-059-145-7 | ||
Identifier (URI): | https://catalog.elra.info/en-us/repository/browse/ELRA-W0119/ | |
Language: | Swahili (macrolanguage) | |
Language (ISO639): | swa | |
Medium: | Not specified | |
Publisher: | ELRA (European Language Resources Association) | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | ELRA Catalogue of Language Resources | |
Description: | http://www.language-archives.org/archive/catalogue.elra.info | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:catalogue.elra.info:ELRA-W0119 | |
DateStamp: | 2017-07-12 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | n.a. 2017. ELRA (European Language Resources Association). | |
Terms: | dcmi_Text iso639_swa olac_primary_text |