![]() |
OLAC Record oai:www.ldc.upenn.edu:LDC2025T11 |
Metadata | ||
Title: | KAIROS Phase 1 Quizlet | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Chen, Song, et al. KAIROS Phase 1 Quizlet LDC2025T11. Web Download. Philadelphia: Linguistic Data Consortium, 2025 | |
Contributor: | Chen, Song | |
Bies, Ann | ||
Mott, Justin | ||
Caruso, Christopher | ||
Tracey, Jennifer | ||
Strassel, Stephanie | ||
Date (W3CDTF): | 2025 | |
Date Issued (W3CDTF): | 2025-08-15 | |
Description: | *Introduction* KAIROS Phase 1 Quizlet was developed by the Linguistic Data Consortium (LDC). It contains English and Spanish text, video and image data and annotations used for pre-evaluation research and system development during Phase 1 of the DARPA KAIROS program. KAIROS Quizlets were a series of narrowly defined tasks designed to explore specific evaluation objectives enabling KAIROS system developers to exercise individual system components on a small data set prior to the full program evaluation. This corpus contains the complete set of Quizlet data used in Phase 1 which focused on two real-world complex events (CEs) within the Improvised Explosive Device bombing scenario: CE1001 (2018 Caracas drone attack) and CE1002 (Utah High School backpack bombing). The DARPA KAIROS (Knowledge-directed Artificial Intelligence Reasoning Over Schemas) program aimed to build technology capable of understanding and reasoning about complex real-world events in order to provide actionable insights to end users. KAIROS systems utilized formal event representations in the form of schema libraries that specified the steps, preconditions and constraints for an open set of complex events; schemas were then used in combination with event extraction to characterize and make predictions about real-world events in a large multilingual, multimedia corpus. *Data* Four quizlets were developed in Phase 1. In additon to the source documents, this release contains the contents of Quizlet 3 (graph G annotation generated with manual annotation) and Quizlet 4 (source documents, manual annotation, updated graph G). Quizlet 1 (evaluation task introduction) did not require data or annotation and is not included in this release. Quizlet 2 (schema generation and instantiation) used source documents but did not include annotation. Source data was collected from the web; 30 root web pages were collected and processed, yielding 29 text data files, 216 image files and 5 video files. Annotation steps included labeling scenario-relevant events and relations for each document to develop a structured representation of temporally ordered events, relations and arguments and to generate a reference knowledge graph. Source data is presented in various formats: .gif, .jpg,. ltf, .mp4, .png, .psm, and .svg. Annotations are presented as tab separated files (.tab) for temporal ordering, relations, events, and arguments. *Samples* Please view these samples: * Argument Annotations (.tab) * Graph G (.json) * PSM (.xml) * LTF (.xml) *Sponsorship* KAIROS was sponsored by the Air Force Research Laboratory (AFRL) and the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-19-S-0014. *Updates* No updates at this time. | |
Extent: | Corpus size: 125 KB | |
Identifier: | LDC2025T11 | |
https://catalog.ldc.upenn.edu/LDC2025T11 | ||
ISLRN: 357-044-554-407-1 | ||
DOI: 10.35111/rcba-vb61 | ||
Language: | English | |
Spanish | ||
Language (ISO639): | eng | |
spa | ||
License: | LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2025T11 | |
Rights Holder: | Portions © 2019 Boston Globe Media Partners, LLC, © 2019 Critical Threats Project, © 2020 El Comercio Group, © 2020 Europa Press, © 2020 France 24, © 2020 Frandsen Digital Media, LLC, © 2020 Gannett Satellite Information Network, LLC, © 2020 Google LLC, © 2020 Guardian News & Media Limited or its affiliated companies, © 2019 Hearst Magazine Media, Inc., © 2020 Malecon Media Group SL, © 2020 NBCUNIVERSAL MEDIA, LLC, © 2020 Nexstar Media Group, Inc., © 2019 ROHM CO., LTD., © 2020 Sinclair Broadcast Group, © 2020 Spain Export Film & TV, © 2020 The Associated Press, © 2019 The Atlantic Monthly Group, © 2020 The E.W. Scripps Company, © 2019 The New York Times Company, © 2020 The Republic EC, © 2020 Vox Media, LLC, © 2020 Yahoo, © 2020, 2025 Trustees of the University of Pennsylvania | |
Subject: | English language | |
Subject (ISO639): | eng | |
Type (DCMI): | Image | |
MovingImage | ||
Software | ||
Sound | ||
StillImage | ||
Text | ||
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2025T11 | |
DateStamp: | 2025-08-15 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Chen, Song; Bies, Ann; Mott, Justin; Caruso, Christopher; Tracey, Jennifer; Strassel, Stephanie. 2025. Linguistic Data Consortium. | |
Terms: | area_Europe country_ES country_GB dcmi_Image dcmi_MovingImage dcmi_Software dcmi_Sound dcmi_StillImage dcmi_Text iso639_eng iso639_spa olac_primary_text | |
Inferred Metadata | ||
Country: | United Kingdom | |
Area: | Europe |