![]() |
OLAC Record oai:www.ldc.upenn.edu:LDC2025T07 |
Metadata | ||
Title: | KAIROS Schema Learning Complex Event Annotation | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Chen, Song, et al. KAIROS Schema Learning Complex Event Annotation LDC2025T07. Web Download. Philadelphia: Linguistic Data Consortium, 2025 | |
Contributor: | Chen, Song | |
Tracey, Jennifer | ||
Bies, Ann | ||
Caruso, Christopher | ||
Strassel, Stephanie | ||
Date (W3CDTF): | 2025 | |
Date Issued (W3CDTF): | 2025-06-16 | |
Description: | *Introduction* KAIROS Schema Learning Complex Event Annotation was developed by the Linguistic Data Consortium (LDC) to support the DARPA KAIROS program. It contains English and Spanish text, audio, video and image data labeled for 93 real-world complex events (CEs) with event, relation and argument annotations linking to document provenance. The DARPA KAIROS (Knowledge-directed Artificial Intelligence Reasoning Over Schemas) program aimed to build technology capable of understanding and reasoning about complex real-world events in order to provide actionable insights to end users. KAIROS systems utilized formal event representations in the form of schema libraries that specified the steps, preconditions and constraints for an open set of complex events; schemas were then used in combination with event extraction to characterize and make predictions about real-world events in a large multilingual, multimedia corpus. *Data* Source data was collected from the web by LDC. 3431 root web pages were collected and processed, yielding 1919 text data files, 24019 image files, 1472 video files and 16 audio files. Annotatioan steps included provenance linking (linking events in a document to a CE) and mentions (event and relation frames). Data scouting and annotation guidelines are included in the documentation accompanying this release. The table below summarizes the number of documents collected and the annotation applied to them: * Total CEs - total complex events subject to data collection and annotation * Total Docs Source - CE-relevant root documents collected and processed * Total Docs for Provlink - root documents labeled for provenance linking * Total Docs Mention - root documents labeled for events, relations, and schema linking Language Total CEs Total Docs Source Total Docs for Provlink Total Docs Mention English 93 2,190 650 216 Spanish 90 1,241 493 122 Total 93 3,431 1,143 338 Software tools are also included in this release. The tools recreate original source data from the processed XML material. * ltf2rsd.perl -- convert ltf.xml files to rsd.txt (raw-source-data) * ltfzip2rsd.perl -- extract and convert ltf.xml files from zip archives *Sponsorship* KAIROS was sponsored by the Air Force Research Laboratory (AFRL) and the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-19-S-0014. *Samples* Please view the following samples: * Argument Mentions * Complex Event Linking * Event Mention Annotation * Event Mention Argument Slots * Provlinking * Relation Mention Annotation * Relation Mention Argument Slots *Updates* No updates at this time. | |
Extent: | Corpus size: 49067401 KB | |
Identifier: | LDC2025T07 | |
https://catalog.ldc.upenn.edu/LDC2025T07 | ||
ISLRN: 547-554-339-324-6 | ||
DOI: 10.35111/g7sc-nt96 | ||
Language: | Spanish | |
English | ||
Language (ISO639): | spa | |
eng | ||
License: | LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2025T07 | |
Rights Holder: | Portions © 2017 13.CL, © 2019 47abc, © 2020 ABC News Internet Ventures, © 2018-2020 A&E Television Networks, LLC, © 2017-2018 AL DÍA NEWS Media, © 2017, 2019-2020 ALM Media Properties, LLC, © 2020 AlMomento.net, © 2020 American City Business Journals, © 2020 Anti-Defamation League, © 2019-2020 Autodesk, Inc., © 2014, 2020 Bloomberg L.P., © 2016-2017, 2019 BuzzFeed, Inc., © 2020 Cable News Network. A Warner Media Company, © 2016-2018 CBS Interactive Inc., © 2020 Charlotte Observer, © 2019 Chicago Tribune, © 2014, 2018 China Daily Information Co., © 2020 Cision US Inc., © 2020 Contxto, © 2013, 2019-2020 Corporation of Spanish Radio and Television, © 2020 Divorce Source, Inc.,© 2004, 2006, 2007 GateHouse Media, LLC, © 2020 GlobeNewswire, Inc., © 2017 GOBankingRates, © 2015, 2019, 2020 Gray Television, Inc., © 2008 Griffin Communications, © 2020 Hearst Magazine Media, Inc., © 2011-2019 Impremedia Operating Company LLC, © 2017-2020 Insider Inc, © 2020 KPWHRI, © 2018, 2020 KQED Inc., © 2020 Kurdistan24, © 2020 Latin American Information Agency Prensa Latina, © 2017, 2019-2020 Listen Notes, Inc., © 2017-2018, 2020 Los Angeles Times, © 2016, 2018-2019 Microsoft, © 2020 MJH Life Sciences and Pharmacy Times, © 2016 MUNDOJURIDICO.INFO, © 2018, 2020 NBCUniversal Media, LLC, © 2017, 2019 News Group Newspapers Limited, © 2015, 2018-2020 Nexstar Inc., © 2016, 2019 NYP Holdings, Inc., © 2011, 2015, 2017, 2020 Patch Media, © 2019 Peoria Public Radio, © 2016, 2019-2020 Perfil.com, © 2016 Plan V, © 2020 Public Citizen, © 2014, 2019 Republica Media Group, © 2019 Reuters, © 2013-2014, 2018 RFE/RL, Inc., © 2013, 2020 Scientific American, A Division of Springer Nature America, Inc., © 2014-2015, 2017 StarMedia, © 2020 Tacoma News Tribune,© 2018 The Cumberland Times-News, © 2014, 2017-2018 The New York Times Company, © 2018-2019 The Philadelphia Inquirer, LLC, © 2019-2020 THE POINTS GUY, LLC, © 2020 The Regents of The University of California,© 2018 The Sacramento Bee, © 2017, 2019 The Texas Tribune, © 2014, 2017-2018 The Washington Post, © 2010, 2012, 2017 The World from PRX, © 2020 Tri-City Herald, © 2017, 2019-2020 Univision Communications Inc., © 2016 WVTF, © 2021, 2025 Trustees of the University of Pennsylvania | |
Subject: | Spanish language | |
English language | ||
Subject (ISO639): | spa | |
eng | ||
Subject (OLAC): | computational_linguistics | |
Type (DCMI): | Image | |
MovingImage | ||
Software | ||
Sound | ||
StillImage | ||
Text | ||
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2025T07 | |
DateStamp: | 2025-06-16 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Chen, Song; Tracey, Jennifer; Bies, Ann; Caruso, Christopher; Strassel, Stephanie. 2025. Linguistic Data Consortium. | |
Terms: | area_Europe country_ES country_GB dcmi_Image dcmi_MovingImage dcmi_Software dcmi_Sound dcmi_StillImage dcmi_Text iso639_eng iso639_spa olac_computational_linguistics olac_primary_text | |
Inferred Metadata | ||
Country: | SpainUnited Kingdom | |
Area: | Europe |