OLAC Record oai:catalogue.elra.info:ELRA-W0321 |
Metadata | ||
Title: | Tham Khasi annotated corpus | |
Access Rights: | Rights available for: nonCommercialUse, commercialUse | |
Date Available (W3CDTF): | 2022-03-09 | |
Date Issued (W3CDTF): | 2022-03-09 | |
Description: | The Tham Khasi annotated corpus is a Khasi corpus, an Austro-Asiatic language, comprising of Khasi sentences extracted from textbooks prescribed for students in secondary, higher secondary, graduation, and post-graduation in the year 2015-2016. In the corpus, each word is separated by a space and each sentence is marked with an end of sentence marker such as a period (.), a question mark (?) or an exclamation mark (!). The sentences are manually tagged for parts of speech using the BIS (Bureau of Indian Standards) tagset which is the standard annotation scheme prescribed for Indian languages. The corpus contains 83,312 words, 4,386 sentences, 5,465 word types which amounts to 94,651 tokens (including punctuations). The corpus is provided as one single file in text format. | |
Identifier: | ELRA-W0321 | |
ISLRN: 926-738-235-188-8 | ||
Identifier (URI): | https://catalog.elra.info/en-us/repository/browse/ELRA-W0321/ | |
Language: | Khasi | |
Language (ISO639): | kha | |
Medium: | Not specified | |
Publisher: | ELRA (European Language Resources Association) | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | ELRA Catalogue of Language Resources | |
Description: | http://www.language-archives.org/archive/catalogue.elra.info | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:catalogue.elra.info:ELRA-W0321 | |
DateStamp: | 2022-03-09 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | n.a. 2022. ELRA (European Language Resources Association). | |
Terms: | area_Asia country_IN dcmi_Text iso639_kha olac_primary_text |