OLAC Record oai:catalogue.elra.info:ELRA-W0336 |
Metadata | ||
Title: | Parallel Corpora & Domains (bilingual and multilingual) | |
Access Rights: | Rights available for: commercialUse | |
Coverage: | Portugal | |
Brazil | ||
Date Available (W3CDTF): | 2023-10-04 | |
Date Issued (W3CDTF): | 2023-10-04 | |
Description: | Parallel corpora for nearly 400 language pairs and numerous multilingual combinations, including 10 million bilingual segments and 90 million tokens in 20 languages: Arabic, Chinese (Simplified), Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Italian, Japanese, Korean, North Sami, Norwegian, Polish, Portuguese (Brazilian and European), Russian, Spanish, Swedish, and Turkish.The segments consist of full sentences and short phrases with translation equivalents, based on corpus evidence and frequency, and were originally created by editors and translators worldwide as examples of usage for dictionary entries. Some of the bilingual pairs were generated via a third pivot language.The data can be applied to train Machine Learning and Large Language Models and to boost the performance of Machine Translation solutions.Besides general language vocabularies, there are segments for over a hundred vertical domains:administration, advertising, aeronautics, agriculture, anatomy, anthropology, archaeology, architecture, art, astrology, astronomy, automobiles, aviation, biology, botanics, cartography, chemistry, cinema, clothing, color, commerce, computers, construction, cosmetics, culinary, culture, dance, data, dress, drinks, drugs, ecology, economics, education, electricity, electronics, energy, engineering, entertainment, environment, family, fashion, finance, furniture, games, genetics, geography, geology, geometry, grammar, health, history, hygiene, industry, informatics, Internet, IT, journalism, law, leisure/hobbies, linguistics, literature, maritime, marketing, mathematics, measurements/units, mechanics, medicine, meteorology, military, music, mythology, nautical, occupation, oceanography, optics, pharmacology, philosophy, photography, physics, physiology, police, politics, post, psychology, publishing, radio, rail, religion, school, sex, sociology, space, sport, statistics, technical, technology, telecommunication, telephone, television, theatre, theology, time, tourism, transportation, university, zoology. Note: Prices are indicated per segment unit. Please contact us to obtain our quotation corresponding to expected languages and domains. | |
Identifier: | ELRA-W0336 | |
ISLRN: 471-919-856-164-1 | ||
Identifier (URI): | https://catalog.elra.info/en-us/repository/browse/ELRA-W0336/ | |
Language: | Russian | |
Chinese | ||
German | ||
English | ||
Dutch; Flemish | ||
Modern Greek (1453-) | ||
Portuguese | ||
Danish | ||
Norwegian | ||
Northern Sami | ||
Italian | ||
Japanese | ||
Korean | ||
Spanish; Castilian | ||
Polish | ||
Swedish | ||
Arabic | ||
Finnish | ||
Hebrew | ||
French | ||
Turkish | ||
Language (ISO639): | rus | |
zho | ||
deu | ||
eng | ||
nld | ||
ell | ||
por | ||
dan | ||
nor | ||
sme | ||
ita | ||
jpn | ||
kor | ||
spa | ||
pol | ||
swe | ||
ara | ||
fin | ||
heb | ||
fra | ||
tur | ||
Medium: | Not specified | |
Publisher: | ELRA (European Language Resources Association) | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | ELRA Catalogue of Language Resources | |
Description: | http://www.language-archives.org/archive/catalogue.elra.info | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:catalogue.elra.info:ELRA-W0336 | |
DateStamp: | 2023-10-04 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | n.a. 2023. ELRA (European Language Resources Association). | |
Terms: | area_Asia area_Europe country_DE country_DK country_ES country_FI country_FR country_GB country_GR country_IL country_IT country_JP country_KR country_NL country_NO country_PL country_PT country_RU country_SE country_TR dcmi_Text iso639_ara iso639_dan iso639_deu iso639_ell iso639_eng iso639_fin iso639_fra iso639_heb iso639_ita iso639_jpn iso639_kor iso639_nld iso639_nor iso639_pol iso639_por iso639_rus iso639_sme iso639_spa iso639_swe iso639_tur iso639_zho olac_primary_text |