OLAC Record oai:lindat.mff.cuni.cz:11234/1-2943 |
Metadata | ||
Title: | OAGK Keyword Generation Dataset | |
Bibliographic Citation: | http://hdl.handle.net/11234/1-2943 | |
Creator: | Çano, Erion | |
Date (W3CDTF): | 2019-03-08T12:46:46Z | |
Date Available: | 2019-03-08T12:46:46Z | |
Description: | OAGK is a keyword extraction/generation dataset consisting of 2.2 million abstracts, titles and keyword strings from cientific articles. Texts were lowercased and tokenized with Stanford CoreNLP tokenizer. No other preprocessing steps were applied in this release version. Dataset records (samples) are stored as JSON lines in each text file. This data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released under ODC-BY licence. This data (OAGK Keyword Generation Dataset) is released under CC-BY licence (https://creativecommons.org/licenses/by/4.0/). If using it, please cite the following paper: Çano, Erion and Bojar, Ondřej, 2019, Keyphrase Generation: A Text Summarization Struggle, 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics, June 2019, Minneapolis, USA | |
Identifier (URI): | http://hdl.handle.net/11234/1-2943 | |
Is Replaced By (URI): | http://hdl.handle.net/11234/1-3062 | |
Language: | English | |
Language (ISO639): | eng | |
Publisher: | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) | |
Rights: | Creative Commons - Attribution 4.0 International (CC BY 4.0) | |
http://creativecommons.org/licenses/by/4.0/ | ||
Subject: | keyword extraction | |
supervised keyword generation | ||
Type: | corpus | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:lindat.mff.cuni.cz:11234/1-2943 | |
DateStamp: | 2021-06-29 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Çano, Erion. 2019. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL). | |
Terms: | area_Europe country_GB dcmi_Text iso639_eng olac_primary_text |