OLAC Record oai:www.ldc.upenn.edu:LDC2024T04 |
Metadata | ||
Title: | AIDA Scenario 2 Practice Topic Source Data | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Tracey, Jennifer, et al. AIDA Scenario 2 Practice Topic Source Data LDC2024T04. Web Download. Philadelphia: Linguistic Data Consortium, 2024 | |
Contributor: | Tracey, Jennifer | |
Strassel, Stephanie | ||
Getman, Jeremy | ||
Bies, Ann | ||
Griffitt, Kira | ||
Graff, David | ||
Caruso, Christopher | ||
Date (W3CDTF): | 2024 | |
Date Issued (W3CDTF): | 2024-04-15 | |
Description: | *Introduction* AIDA Scenario 2 Practice Topic Source Data was developed by the Linguistic Data Consortium (LDC) and is comprised of 1500 root documents, including text, image, and video, from English, Russian, and Spanish web sources. The DARPA AIDA (Active Interpretation of Disparate Alternatives) program aimed to develop a multi-hypothesis semantic engine to generate explicit alternative interpretations of events, situations and trends from a variety of unstructured sources. LDC supported AIDA by collecting, creating and annotating multimodal linguistic resources in multiple languages. Each phase of the AIDA program centered on a specific scenario, or broad topic area, with related subtopics designated as either practice subtopics or evaluation subtopics. The Phase 2 scenario focused on the socioeconomic and political crisis in Venezuela since 2010. This corpus constitutes the full set of topic-focused documents for Phase 2 practice subtopics. *Data* Data was collected from web sources by a combination of automatic and manual processes. HTML content was converted from its original form into XML. To the extent possible, all resources referenced by a given "root" HTML page (style sheets, javascript, images, media files, etc.) were stored as separate files of the given data type and assigned separate 9-character file-IDs (the same form of ID used for the "root" HTML page). The knowledge base for entity detection and linking annotation for all AIDA Scenario 1 and 2 corpora is available separately as AIDA Scenario 1 and 2 Reference Knowledge Base (LDC2023T10). *Sponsorship* This material is based upon work supported by Air Force Research Laboratory (AFRL) and the Defense Advanced Research Projects Agency (DARPA) under Contract No. FA8750-18-C-0013. *Samples* Please view the following samples: * LTF XML * PSM XML *Updates* None at this time. | |
Extent: | Corpus size: 7465441 KB | |
Format: | Sampling Rate: 44100 Hz | |
Sampling Format: mpeg | ||
Identifier: | LDC2024T04 | |
https://catalog.ldc.upenn.edu/LDC2024T04 | ||
ISLRN: 484-106-854-383-0 | ||
DOI: 10.35111/0hze-0459 | ||
Language: | English | |
Spanish | ||
Russian | ||
Language (ISO639): | eng | |
spa | ||
rus | ||
License: | LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2024T04 | |
Rights Holder: | Portions © 2015 21st Century Wire, © 2020 ABC, © 2013 ABC News Internet Ventures, © 2014, 2017-2018 Alba Ciudad 96.3 FM, © 2017 AL DÍA NEWS Media, © 2017-2018 Al Jazeera Media Network, © 2018 AméricaEconomía, © 2019 American Association for the Advancement of Science, © 2019 Americas Society/Council of the Americas, © 2020 AMX Content SA de CV, © 2014, 2017 Arguments and Facts JSC, © 2014 ARMENPRESS, © 2018 Authorized by the Chief Agent, CPC, © 2014, 2017-2018 Autonomous Nonprofit Organization “TV-Novosti”, © 2013-2014, 2018-2019 BBC, © 2015, 2017-2018 Bellingcat, © 2019 Breitbart, © 2018 Business capital, © 2020 business/media bureau ekonomika,© 2019-2020 C.A. IBERONEWS LIMITED, © 2018-2020 C.A. The Universe, © 2013, 2017 Cable News Network. Turner Broadcasting System, Inc., © 2017 Caracas Chronicles, © 2018 Caracol SA, © 2018 CARACOL TELEVISIÓN SA, © 2013, 2017 CBC/Radio-Canada, © 2013 CBS Interactive Inc., © 2020 CDN, © 2017 Center for Democracy in the Americas, © 2014-2015 Channel One, © 2017 Chicago Tribune, © 2020 China Daily Information Co, © 2014 CJSC Editorial office of the newspaper Moskovsky Komsomolets, © 2014 CNBC LLC, © 2020 COHA, © 2014 Colombia Reports, © 2015, 2012 Comments, © 2018 COMUNICAN SA, © 2018 Condé Nast, © 2019-2020 CounterPunch, © 2020 Crisis Group, © 2019 Dailymotion, © 2020 Daily News of Vladivostok, © 2018 DiarioContraste.com, © 2017 Diariocorreo.pe, © 2014 Diario La Voz, © 2018, 2020 Diario las Americas, © 2018 Dicasterium pro Communicatione © 2019 Dixi Media Digital, SL, © 2014 DolarToday.com, © 2014, 2017 Dow Jones & Company, Inc., © 2020 EADaily, © 2014, 2017-2018 EDICIONES EL PAÍS SL, © 2018 Ediciones Prensa Libre SL, © 2019 Editions CDR, © 2020 Editorial Ecoprensa, S.A., © 2017-2018 Editorial Office of Rossiyskaya Gazeta, © 2018 Editorial Prensa Alicantina SAU, © 2018 Efecto Cocuyo CA, © 2020 EL COLOMBIANO S.A.S, © 2014 Elcomercio.pe, © 2018, 2020 EL HERALDO S.A., © 2019 El Impulso, © 2018 El Nuevo Herald, © 2019 EL PERIÓDICO DE CATALUNYA, SLU, © 2019-2020 el Popular, © 2020 EL TERRITORIO, © 2017 EL TIEMPO Casa Editorial, © 2017 El Tiempo Latino, © 2020 elucabista, © 2018-2019 El Universal, © 2020 Encyclopedia Britannica, Inc., © 2019 Entravision, © 2019 Epoch Times Russia, © 2019 euronews, © 2018-2019 Europa Press, © 2018 Euroradio, © 2020 Excelsior, © 2014 FAN, © 2018 First News Media, © 2014 Forbes.com LLC, © 2018 France 24, © 2017 Future Publishing Limited, Quay House, The Ambury, Bath BA1 1UA, © 2020 GardaWorld, © 2020 GlobalResearch.ca, © 2020 G/O Media Inc., © 2014-2015, 2017 Golden Middle LLC, © 2018-2019 Google LLC, © 2014 GORDON, © 2014 Graham Digital Holding Company, © 2018 Grupo La República Publicaciones SA, © 2014 Guardian News and Media Limited or its affiliated companies, © 2014 Haaretz Daily Newspaper Ltd., © 2020 Havana Times, © 2019 HindustanTimes, © 2018, 2020 HispanTV, © 2020 Houston Public Media, A Service of the University of Houston, © 2020 HSB Group, © 2018 ID "Interlocutor", © 2014 Image and Communication, © 2018 Impremedia Operating Company LLC, © 2017 Independent.co.uk, © 2018-2020 Infobae, © 2017 Information agency "Ukrainian National News", © 2014 Informe21.com, © 2018 Innova and Comunica Media SL, © 2014 InoSMI.ru, © 2017 Interfax-Ukraine, © 2017 iPress.ua, © 2020 IT Plus, © 2018 Izvestia MIC, © 2017 Journal Media Ltd., © 2018 Journalistic Society El Ciudadano Ltda, © 2015-2018 JSC Business News Media, © 2014-2015, 2017 JSC Kommersant, © 2014, 2017-2018 JSC Gazeta.Ru, © 2018 JSC NTV Television Company, © 2019 JSC TRK Armed Forces “ZVEZDA ", © 2013, 2017 JSC TV and Radio Company Petersburg, © 2014-2015, 2017 Korrespondent.net, © 2014-2019 Latin Post, © 2017 LLC Business Newspaper "Vzglyad", © 2017 LLC RTVIA Production, © 2014 Los Angeles Times, © 2018 Media Corporation of Extremadura SA, © 2015-2016 Meduza, © 2015-2016 MIA Russia Today, © 2018 Miami Herald, ©2018 Miami New Times, LLC, © 2018 Microsoft, © 2018 MintPress News, © 2019 Natural News Network, © 2015, 2017 NBC Universal, © 2017 News24Today, © 2019 NEWS.am, © 2018 NEWSONE.UA, © 2018-2019 Newspaper First Edition, © 2017 News up to date, © 2016 Newsweek Digital LLC, © 2018 Nextstar Media Inc., © 2018-2020 Nezavisimaya Gazeta, © 2014 Nine Digital Network, © 2020 NOTICIAS AL DIA Y A LA HORA, © 2019 Novaya Gazeta, © 2017-2018 npr, © 2020 OAS, © 2020 Orlando Sentinel, © 2020 Our newspaper,© 2018 PJmedia.com/Salem Media, © 2018 Polit.ru, © 2017 PolitRussia, © 2013-2019 Pravda.Ru LLC, © 2015-2016 Present Time, © 2013, 2018 Publishing House | |
Type (DCMI): | MovingImage | |
Software | ||
Sound | ||
StillImage | ||
Text | ||
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2024T04 | |
DateStamp: | 2024-04-15 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Tracey, Jennifer; Strassel, Stephanie; Getman, Jeremy; Bies, Ann; Griffitt, Kira; Graff, David; Caruso, Christopher. 2024. Linguistic Data Consortium. | |
Terms: | area_Europe country_ES country_GB country_RU dcmi_MovingImage dcmi_Software dcmi_Sound dcmi_StillImage dcmi_Text iso639_eng iso639_rus iso639_spa olac_primary_text |