OLAC Record oai:lindat.mff.cuni.cz:11234/1-3079 |
Metadata | ||
Title: | OAGSX Title Generation Dataset | |
Bibliographic Citation: | http://hdl.handle.net/11234/1-3079 | |
Creator: | Çano, Erion | |
Date (W3CDTF): | 2019-10-31T09:04:42Z | |
Date Available: | 2019-10-31T09:04:42Z | |
Description: | OAGSX is a title generation dataset consisting of 34408509 abstracts and titles from scientific articles. The texts were lowercased and tokenized with Stanford CoreNLP tokenizer. No other preprocessing steps were applied in this release version. Dataset records (samples) are stored as JSON lines in each text file. The data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released under ODC-BY license. This data (OAGSX Title Generation Dataset) is released under CC-BY license (https://creativecommons.org/licenses/by/4.0/). If using it, please consider citing also the following paper: Çano Erion, Bojar Ondřej. Two Huge Title and Keyword Generation Corpora of Research Articles. LREC 2020, Proceedings of the the 12th International Conference on Language Resources and Evaluation, Marseille, France, May 2020. | |
Identifier (URI): | http://hdl.handle.net/11234/1-3079 | |
Language: | English | |
Language (ISO639): | eng | |
Publisher: | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) | |
Replaces (URI): | http://hdl.handle.net/11234/1-3043 | |
Rights: | Creative Commons - Attribution 4.0 International (CC BY 4.0) | |
http://creativecommons.org/licenses/by/4.0/ | ||
Subject: | Title Generation Dataset | |
Abstractive Text Summarization | ||
Scientific Papers Corpus | ||
Type: | corpus | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:lindat.mff.cuni.cz:11234/1-3079 | |
DateStamp: | 2021-06-29 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Çano, Erion. 2019. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL). | |
Terms: | area_Europe country_GB dcmi_Text iso639_eng olac_primary_text |