OLAC Record oai:lindat.mff.cuni.cz:11234/1-3776 |
Metadata | ||
Title: | FERNET-C5 | |
Bibliographic Citation: | http://hdl.handle.net/11234/1-3776 | |
Creator: | Lehečka, Jan | |
Švec, Jan | ||
Date (W3CDTF): | 2021-09-20T12:33:51Z | |
Date Available: | 2021-09-20T12:33:51Z | |
Description: | The FERNET-C5 is a monolingual BERT language representation model trained from scratch on the Czech Colossal Clean Crawled Corpus (C5) data - a Czech mutation of the English C4 dataset. The training data contained almost 13 billion words (93 GB of text data). The model has the same architecture as the original BERT model, i.e. 12 transformation blocks, 12 attention heads and the hidden size of 768 neurons. In contrast to Google’s BERT models, we used SentencePiece tokenization instead of the Google’s internal WordPiece tokenization. More details can be found in README.txt. Yet more detailed description is available in https://arxiv.org/abs/2107.10042 The same models are also released at https://huggingface.co/fav-kky/FERNET-C5 | |
Identifier (URI): | http://hdl.handle.net/11234/1-3776 | |
Language: | Czech | |
Language (ISO639): | ces | |
Publisher: | University of West Bohemia, Department of Cybernetics | |
Rights: | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) | |
http://creativecommons.org/licenses/by-nc-sa/4.0/ | ||
Subject: | Czech | |
BERT | ||
Czech language | ||
Subject (ISO639): | ces | |
Type: | languageDescription | |
Type (DCMI): | Text | |
Type (OLAC): | language_description | |
OLAC Info |
||
Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:lindat.mff.cuni.cz:11234/1-3776 | |
DateStamp: | 2021-09-20 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Lehečka, Jan; Švec, Jan. 2021. University of West Bohemia, Department of Cybernetics. | |
Terms: | area_Europe country_CZ dcmi_Text iso639_ces olac_language_description | |
Inferred Metadata | ||
Country: | Czech Republic | |
Area: | Europe |