OLAC Record
oai:lindat.mff.cuni.cz:11234/1-2943

Metadata
Title:OAGK Keyword Generation Dataset
Bibliographic Citation:http://hdl.handle.net/11234/1-2943
Creator:Çano, Erion
Date (W3CDTF):2019-03-08T12:46:46Z
Date Available:2019-03-08T12:46:46Z
Description:OAGK is a keyword extraction/generation dataset consisting of 2.2 million abstracts, titles and keyword strings from cientific articles. Texts were lowercased and tokenized with Stanford CoreNLP tokenizer. No other preprocessing steps were applied in this release version. Dataset records (samples) are stored as JSON lines in each text file. This data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released under ODC-BY licence. This data (OAGK Keyword Generation Dataset) is released under CC-BY licence (https://creativecommons.org/licenses/by/4.0/). If using it, please cite the following paper: Çano, Erion and Bojar, Ondřej, 2019, Keyphrase Generation: A Text Summarization Struggle, 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics, June 2019, Minneapolis, USA
Identifier (URI):http://hdl.handle.net/11234/1-2943
Is Replaced By (URI):http://hdl.handle.net/11234/1-3062
Language:English
Language (ISO639):eng
Publisher:Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Rights:Creative Commons - Attribution 4.0 International (CC BY 4.0)
http://creativecommons.org/licenses/by/4.0/
Subject:keyword extraction
supervised keyword generation
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11234/1-2943
DateStamp:  2021-06-29
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Çano, Erion. 2019. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL).
Terms: area_Europe country_GB dcmi_Text iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11234/1-2943
Up-to-date as of: Thu Oct 5 0:40:57 EDT 2023