OLAC Record oai:catalogue.elra.info:ELRA-W0031 |
Metadata | ||
Title: | GeFRePaC - German French Reciprocal Parallel Corpus | |
Access Rights: | Rights available for: nonCommercialUse | |
Date Available (W3CDTF): | 2002-01-15 | |
Date Issued (W3CDTF): | 2002-01-15 | |
Date Modified (W3CDTF): | 2017-06-26 | |
Description: | The German-French Reciprocal Parallel Corpus (GeFRePaC) was produced by the Multilinguale Forschung/Multilingual Research Abteilung Lexik, Institut für Deutsche Sprache (Germany) through a funding from ELRA in the framework of the European Commission project LRsP&P (Language Resources Production & Packaging - LE4-8335).The German-French Reciprocal Parallel Corpus (GeFRePaC) is a 30 million word corpus (15 million for each language) for the purpose of developing, enhancing and improving translation aids (dictionaries, lexicons, platforms) for French-German and German-French translation. The database consists of the following parallel corpora:European Union CELEX Database: Treaties, Foreign relations, Law, Complementar Law and all the published documents of the "European Parliament".Celex-Database: 22,000,000 words (German+French)Europarl: 8,320,000 words (German+French)It covers natural general language as used in public socio-political discourse and it has a focus on multilingual administration and commercial and legal documentation. GeFRePaC comprises a large variety of text types for which there is a rapidly growing need for translation but which currently defy successful machine translation. The corpus is encoded according to the PAROLE guidelines, it was aligned on the sentence level and also for single word translation units on the lexical level, POS-tagged in conformity with EAGLES recommendations and validated according to the most current version of the ELRA guidelines. The parallel German-French texts were aligned using a program developed at the Equipe Langue et Dialogue, Laboratoire Loria, Nancy. The text files containing markup for paragraphs and sentences were processed by the Tree Tagger developed at the IMS Stuttgart. The text files are automatically converted into TEI-conformant SGML format. | |
Identifier: | ELRA-W0031 | |
ISLRN: 086-761-267-762-3 | ||
Identifier (URI): | https://catalog.elra.info/en-us/repository/browse/ELRA-W0031/ | |
Language: | German | |
French | ||
Language (ISO639): | deu | |
fra | ||
Medium: | downloadable | |
Publisher: | ELRA (European Language Resources Association) | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | ELRA Catalogue of Language Resources | |
Description: | http://www.language-archives.org/archive/catalogue.elra.info | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:catalogue.elra.info:ELRA-W0031 | |
DateStamp: | 2002-01-15 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | n.a. 2002. ELRA (European Language Resources Association). | |
Terms: | area_Europe country_DE country_FR dcmi_Text iso639_deu iso639_fra olac_primary_text |