OLAC Record oai:lindat.mff.cuni.cz:11372/LRT-1240 |
Metadata | ||
Title: | BulTreeBank Tokenizer | |
Bibliographic Citation: | http://hdl.handle.net/11372/LRT-1240 | |
Contributor: | Simov, Kiril | |
Creator: | Simov, Kiril | |
Date (W3CDTF): | 2014-07-30T21:33:43Z | |
Date Available: | 2014-07-30T21:33:43Z | |
Description: | The tokenizer is covering all languages that use Latin1, Laitn2, Latin3 and Cyrillic tables of Unicode. Can be extended to cover other tables in Unicode if necessary. The implementation is as a cascaded regular grammar in CLaRK. It recognizes over 60 token categories. It is easy to be adapted to new token categories. | |
Identifier (URI): | http://hdl.handle.net/11372/LRT-1240 | |
Language: | No linguistic content | |
Language (ISO639): | zxx | |
Publisher: | Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences | |
Type: | toolService | |
Type (DCMI): | Software | |
OLAC Info |
||
Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:lindat.mff.cuni.cz:11372/LRT-1240 | |
DateStamp: | 2021-06-29 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Simov, Kiril. 2014. Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences. | |
Terms: | dcmi_Software iso639_zxx |