OLAC Record: YAWA - Yet Another Word Aligner

OLAC Record
oai:lindat.mff.cuni.cz:11372/LRT-1300

Metadata

Title: YAWA - Yet Another Word Aligner

Bibliographic Citation: http://hdl.handle.net/11372/LRT-1300

Contributor: Tufiş, Dan

Ion, Radu

Date (W3CDTF): 2014-07-30T21:34:22Z

Date Available: 2014-07-30T21:34:22Z

Description: YAWA is a four stage lexical aligner that uses bilingual translation lexicons produced by [[http://www.clarin.eu/tools/translation-equivalents-extractor|TREQ]] and phrase boundaries detection to align words of a given bitext. Using this alignment, in stage 2 a language dependent module takes over and produces alignments of the remaining lexical tokens within aligned chunks. Stage 3 is specialized in aligning blocks of consecutive unaligned tokens and stage 4 deletes alignments that are likely to be wrong. Developed in PERL, YAWA is language independent, except for the modules that realise alignments specific to the pairs of aligned languages. So far, it works just for Ro-En pair of languages. It requires a parallel corpus in [[http://www.xces.org|XCES]] format, morpho-syntactically annotated and lemmatized (using [[http://www.clarin.eu/tools/ttl-tokenizing-tagging-and-lemmatizing-free-running-texts|TTL]]), and translation dictionaries produced by [[http://www.clarin.eu/tools/translation-equivalents-extractor|TREQ]]. YAWA’s individual F-measure is 81.22%. Currently YAWA is a part of the [[http://www.clarin.eu/tools/cowal-combined-word-aligner|COWAL]] combined lexical alignment platform. More detailed descriptions are available in [[http://www.racai.ro/~tufis/papers|the following papers]]: -- Radu Ion (2007). Word Sense Disambiguation Methods Applied to English and Romanian. (in Romanian). PhD thesis. Romanian Academy, Bucharest -- Dan Tufiş (2007). Exploiting Aligned Parallel Corpora in Multilingual Studies and Applications. In Toru Ishida, Susan R. Fussell, and Piek T.J.M. Vossen (eds.), Intercultural Collaboration. First International Workshop (IWIC 2007), volume 4568 of Lecture Notes in Computer Science, pp. 103-117. Springer-Verlag, August 2007. ISBN 978-3-540-73999-9. -- Dan Tufiş, Radu Ion, Alexandru Ceauşu, and Dan Ştefănescu (2006). Improved Lexical Alignment by Combining Multiple Reified Alignments. In Toru Ishida, Susan R. Fussell, and Piek T.J.M. Vossen (eds.), Proceedings of the 11th Conference EACL2006, pp. 153-160, Trento, Italy, April 2006. Association for Computational Linguistics. ISBN 1-9324-32-61-2.

Identifier (URI): http://hdl.handle.net/11372/LRT-1300

Language: English

Romanian

Language (ISO639): eng

ron

Publisher: Research Institute for Artificial Intelligence, Romanian Academy of Sciences

Subject: word aligner

Type: toolService

Type (DCMI): Software

OLAC Info

Archive: LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

Description: http://www.language-archives.org/archive/lindat.mff.cuni.cz

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:lindat.mff.cuni.cz:11372/LRT-1300

DateStamp: 2021-06-29

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Tufiş, Dan; Ion, Radu. 2014. Research Institute for Artificial Intelligence, Romanian Academy of Sciences.
Terms: area_Europe country_GB country_RO dcmi_Software iso639_eng iso639_ron

http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11372/LRT-1300
Up-to-date as of: Sun May 4 0:10:55 EDT 2025

Metadata
Title:		YAWA - Yet Another Word Aligner
Bibliographic Citation:		http://hdl.handle.net/11372/LRT-1300
Contributor:		Tufiş, Dan
Contributor:		Ion, Radu
Date (W3CDTF):		2014-07-30T21:34:22Z
Date Available:		2014-07-30T21:34:22Z
Description:		YAWA is a four stage lexical aligner that uses bilingual translation lexicons produced by [[http://www.clarin.eu/tools/translation-equivalents-extractor\|TREQ]] and phrase boundaries detection to align words of a given bitext. Using this alignment, in stage 2 a language dependent module takes over and produces alignments of the remaining lexical tokens within aligned chunks. Stage 3 is specialized in aligning blocks of consecutive unaligned tokens and stage 4 deletes alignments that are likely to be wrong. Developed in PERL, YAWA is language independent, except for the modules that realise alignments specific to the pairs of aligned languages. So far, it works just for Ro-En pair of languages. It requires a parallel corpus in [[http://www.xces.org\|XCES]] format, morpho-syntactically annotated and lemmatized (using [[http://www.clarin.eu/tools/ttl-tokenizing-tagging-and-lemmatizing-free-running-texts\|TTL]]), and translation dictionaries produced by [[http://www.clarin.eu/tools/translation-equivalents-extractor\|TREQ]]. YAWA’s individual F-measure is 81.22%. Currently YAWA is a part of the [[http://www.clarin.eu/tools/cowal-combined-word-aligner\|COWAL]] combined lexical alignment platform. More detailed descriptions are available in [[http://www.racai.ro/~tufis/papers\|the following papers]]: -- Radu Ion (2007). Word Sense Disambiguation Methods Applied to English and Romanian. (in Romanian). PhD thesis. Romanian Academy, Bucharest -- Dan Tufiş (2007). Exploiting Aligned Parallel Corpora in Multilingual Studies and Applications. In Toru Ishida, Susan R. Fussell, and Piek T.J.M. Vossen (eds.), Intercultural Collaboration. First International Workshop (IWIC 2007), volume 4568 of Lecture Notes in Computer Science, pp. 103-117. Springer-Verlag, August 2007. ISBN 978-3-540-73999-9. -- Dan Tufiş, Radu Ion, Alexandru Ceauşu, and Dan Ştefănescu (2006). Improved Lexical Alignment by Combining Multiple Reified Alignments. In Toru Ishida, Susan R. Fussell, and Piek T.J.M. Vossen (eds.), Proceedings of the 11th Conference EACL2006, pp. 153-160, Trento, Italy, April 2006. Association for Computational Linguistics. ISBN 1-9324-32-61-2.
Identifier (URI):		http://hdl.handle.net/11372/LRT-1300
Language:		English
Language:		Romanian
Language (ISO639):		eng
Language (ISO639):		ron
Publisher:		Research Institute for Artificial Intelligence, Romanian Academy of Sciences
Subject:		word aligner
Type:		toolService
Type (DCMI):		Software
OLAC Info
Archive:		LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:		http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:lindat.mff.cuni.cz:11372/LRT-1300
DateStamp:		2021-06-29
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Tufiş, Dan; Ion, Radu. 2014. Research Institute for Artificial Intelligence, Romanian Academy of Sciences.
Terms:		area_Europe country_GB country_RO dcmi_Software iso639_eng iso639_ron