OLAC Record oai:www.ldc.upenn.edu:LDC2018T21 |
Metadata | ||
Title: | TRAD Arabic-French Parallel Text -- Newswire | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Linguistic Data Consortium, and ELDA. TRAD Arabic-French Parallel Text -- Newswire LDC2018T21. Web Download. Philadelphia: Linguistic Data Consortium, 2018 | |
Contributor: | Linguistic Data Consortium | |
ELDA | ||
Date (W3CDTF): | 2018 | |
Date Issued (W3CDTF): | 2018-10-15 | |
Description: | *Introduction* TRAD Arabic-French Parallel Text -- Newswire was developed by ELDA as part of the PEA-TRAD project. It contains French translations of a subset of approximately 20,000 Arabic words from NIST 2008 Open Machine Translation (OpenMT) Evaluation (LDC2010T21). The PEA-TRAD project (Translation as a Support for Document Analysis) was supported by the French Ministry of Defense (DGA). Its purpose was to develop speech-to-speech translation technology for multiple languages (e.g., Arabic, Chinese, Pashto) from a variety of domains. ELDA developed several corpora for this effort. The Linguistic Data Consortium (LDC) has also released the following TRAD corpora: * TRAD Chinese-French Parallel Text -- Blog (LDC2018T02) * TRAD Arabic-French Parallel Text -- Newsgroup (LDC2018T13) * TRAD Chinese-French Parallel Text -- Broadcast News (LDC2018T17) *Data* This release consists of 813 segments (translations units) from 74 documents. The source data is Arabic newswire text collected and translated into English by LDC. Information about the ELDA translation team, translation guidelines and validation results is contained in the documentation accompanying this release. The Arabic source file contains 19,902 words and the French reference translation contains 29,104 words. The data is presented in two unicode-encoded XML files along with an associated DTD. *Samples* Please view this Arabic sample and French sample. *Updates* None at this time. | |
Extent: | Corpus size: 1392 KB | |
Identifier: | LDC2018T21 | |
https://catalog.ldc.upenn.edu/LDC2018T21 | ||
ISBN: 1-58563-860-9 | ||
ISLRN: 121-864-813-997-4 | ||
DOI: 10.35111/z1wg-9x78 | ||
Language: | Arabic | |
Standard Arabic | ||
French | ||
Language (ISO639): | ara | |
arb | ||
fra | ||
License: | TRAD Arabic-French Parallel Text – Newswire Agreement (For-Profit): https://catalog.ldc.upenn.edu/license/trad-arabic-french-parallel-text-newswire-agreement-for-profit.pdf | |
TRAD Arabic-French Parallel Text – Newswire Agreement (Non-Member): https://catalog.ldc.upenn.edu/license/trad-arabic-french-parallel-text-newswire-agreement-non-member.pdf | ||
TRAD Arabic-French Parallel Text – Newswire Agreement (Not-For-Profit): https://catalog.ldc.upenn.edu/license/trad-arabic-french-parallel-text-newswire-agreement-not-for-profit.pdf | ||
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2018T21 | |
Rights Holder: | Portions © 2007 Agence France Presse, Al-Ahram, Al Hayat, An Nahar, Al Quds-Al Arabi, Asharq Al-Awsat, Assabah, Xinhua News Agency, © 2018 ELDA, © 2007, 2009, 2010, 2018 Trustees of the University of Pennsylvania | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2018T21 | |
DateStamp: | 2020-11-30 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Linguistic Data Consortium; ELDA. 2018. Linguistic Data Consortium. | |
Terms: | area_Asia area_Europe country_FR country_SA dcmi_Text iso639_ara iso639_arb iso639_fra olac_primary_text |