OLAC Record
oai:lindat.mff.cuni.cz:11234/1-3062

Metadata
Title:OAGKX Keyword Generation Dataset
Bibliographic Citation:http://hdl.handle.net/11234/1-3062
Creator:Çano, Erion
Date (W3CDTF):2019-10-21T13:30:24Z
Date Available:2019-10-21T13:30:24Z
Description:OAGKX is a keyword extraction/generation dataset consisting of 22674436 abstracts, titles and keyword strings from scientific articles. The texts were lowercased and tokenized with Stanford CoreNLP tokenizer. No other preprocessing steps were applied in this release version. Dataset records (samples) are stored as JSON lines in each text file. The data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released under ODC-BY license. This data (OAGKX Keyword Generation Dataset) is released under CC-BY license (https://creativecommons.org/licenses/by/4.0/). If using it, please cite the following paper: Çano Erion, Bojar Ondřej. Keyphrase Generation: A Multi-Aspect Survey. FRUCT 2019, Proceedings of the 25th Conference of the Open Innovations Association FRUCT, Helsinki, Finland, Nov. 2019 To reproduce the experiments in the above paper, you can use the first 100000 lines of part_0_0.txt file.
Identifier (URI):http://hdl.handle.net/11234/1-3062
Language:English
Language (ISO639):eng
Publisher:Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Replaces (URI):http://hdl.handle.net/11234/1-2943
Rights:Creative Commons - Attribution 4.0 International (CC BY 4.0)
http://creativecommons.org/licenses/by/4.0/
Subject:keyword extraction
supervised keyword generation
abstractive keyphrasing
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11234/1-3062
DateStamp:  2021-06-29
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Çano, Erion. 2019. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL).
Terms: area_Europe country_GB dcmi_Text iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11234/1-3062
Up-to-date as of: Thu Oct 5 0:41:00 EDT 2023