OLAC Record oai:lindat.mff.cuni.cz:11234/1-2615 |
Metadata | ||
Title: | SumeCzech | |
Bibliographic Citation: | http://hdl.handle.net/11234/1-2615 | |
Creator: | Straka, Milan | |
Mediankin, Nikita | ||
Kocmi, Tom | ||
Žabokrtský, Zdeněk | ||
Hudeček, Vojtěch | ||
Hajič, Jan | ||
Date (W3CDTF): | 2020-01-10T09:44:46Z | |
Date Available: | 2020-01-10T09:44:46Z | |
Description: | This entry contains the SumeCzech dataset and the metric RougeRAW used for evaluation. Both the dataset and the metric are described in the paper "SumeCzech: Large Czech News-Based Summarization Dataset" by Milan Straka et al. The dataset is distributed as a set of Python scripts which download the raw HTML pages from CommonCrawl and then process them into the required format. The MPL 2.0 license applies to the scripts downloading the dataset and to the RougeRAW implementation. Note: sumeczech-1.0-update-230225.zip is the updated release of the SumeCzech download script, including the original RougeRAW evaluation metric. The download script was modified to use the updated CommonCraw download URL and to support Python 3.10 and Python 3.11. However, the downloaded dataset is still exactly the same. The original archive sumeczech-1.0.zip was renamed to sumeczech-1.0-obsolete-180213.zip and is kept for reference. | |
Identifier (URI): | http://hdl.handle.net/11234/1-2615 | |
Language: | Czech | |
Language (ISO639): | ces | |
Publisher: | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) | |
Rights: | Mozilla Public License 2.0 | |
http://opensource.org/licenses/MPL-2.0 | ||
Subject: | summarization | |
SumeCzech | ||
Rouge | ||
Type: | corpus | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:lindat.mff.cuni.cz:11234/1-2615 | |
DateStamp: | 2023-02-27 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Straka, Milan; Mediankin, Nikita; Kocmi, Tom; Žabokrtský, Zdeněk; Hudeček, Vojtěch; Hajič, Jan. 2020. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL). | |
Terms: | area_Europe country_CZ dcmi_Text iso639_ces olac_primary_text |