OLAC Record
oai:catalogue.elra.info:ELRA-W0336

Metadata
Title:Parallel Corpora & Domains (bilingual and multilingual)
Access Rights: Rights available for: commercialUse
Coverage:Portugal
Brazil
Date Available (W3CDTF):2023-10-04
Date Issued (W3CDTF):2023-10-04
Description:Parallel corpora for nearly 400 language pairs and numerous multilingual combinations, including 10 million bilingual segments and 90 million tokens in 20 languages: Arabic, Chinese (Simplified), Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Italian, Japanese, Korean, North Sami, Norwegian, Polish, Portuguese (Brazilian and European), Russian, Spanish, Swedish, and Turkish.The segments consist of full sentences and short phrases with translation equivalents, based on corpus evidence and frequency, and were originally created by editors and translators worldwide as examples of usage for dictionary entries. Some of the bilingual pairs were generated via a third pivot language.The data can be applied to train Machine Learning and Large Language Models and to boost the performance of Machine Translation solutions.Besides general language vocabularies, there are segments for over a hundred vertical domains:administration, advertising, aeronautics, agriculture, anatomy, anthropology, archaeology, architecture, art, astrology, astronomy, automobiles, aviation, biology, botanics, cartography, chemistry, cinema, clothing, color, commerce, computers, construction, cosmetics, culinary, culture, dance, data, dress, drinks, drugs, ecology, economics, education, electricity, electronics, energy, engineering, entertainment, environment, family, fashion, finance, furniture, games, genetics, geography, geology, geometry, grammar, health, history, hygiene, industry, informatics, Internet, IT, journalism, law, leisure/hobbies, linguistics, literature, maritime, marketing, mathematics, measurements/units, mechanics, medicine, meteorology, military, music, mythology, nautical, occupation, oceanography, optics, pharmacology, philosophy, photography, physics, physiology, police, politics, post, psychology, publishing, radio, rail, religion, school, sex, sociology, space, sport, statistics, technical, technology, telecommunication, telephone, television, theatre, theology, time, tourism, transportation, university, zoology. Note: Prices are indicated per segment unit. Please contact us to obtain our quotation corresponding to expected languages and domains.
Identifier:ELRA-W0336
ISLRN: 471-919-856-164-1
Identifier (URI):https://catalog.elra.info/en-us/repository/browse/ELRA-W0336/
Language:Russian
Chinese
German
English
Dutch; Flemish
Modern Greek (1453-)
Portuguese
Danish
Norwegian
Northern Sami
Italian
Japanese
Korean
Spanish; Castilian
Polish
Swedish
Arabic
Finnish
Hebrew
French
Turkish
Language (ISO639):rus
zho
deu
eng
nld
ell
por
dan
nor
sme
ita
jpn
kor
spa
pol
swe
ara
fin
heb
fra
tur
Medium:Not specified
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-W0336
DateStamp:  2023-10-04
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2023. ELRA (European Language Resources Association).
Terms: area_Asia area_Europe country_DE country_DK country_ES country_FI country_FR country_GB country_GR country_IL country_IT country_JP country_KR country_NL country_NO country_PL country_PT country_RU country_SE country_TR dcmi_Text iso639_ara iso639_dan iso639_deu iso639_ell iso639_eng iso639_fin iso639_fra iso639_heb iso639_ita iso639_jpn iso639_kor iso639_nld iso639_nor iso639_pol iso639_por iso639_rus iso639_sme iso639_spa iso639_swe iso639_tur iso639_zho olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-W0336
Up-to-date as of: Fri Apr 19 6:30:39 EDT 2024