D.Yarowsky; "The Machine Learning of Natural Language Analyzer via..."
Faculty of Engineering and Natural Sciences
FENS SEMINARS
The Machine Learning of Natural Language Analyzers via
Cross-Language Information Projection and Monolingual Bootstrapping
David Yarowsky,
Johns Hopkins University, Baltimore, USA
ABSTRACT
This talk will present a set of methods for the transfer of diverse linguistic knowledge between languages via statistically aligned parallel bilingual text corpora. If we have developed computer programs to assign syntactic or semantic analyses to words in one language, we would like to be able to transfer these analysis capabilities to new languages. The parallel information content in aligned bilingual sentence translations provide a pathway to transfer diverse linguistic annotations to the second languages. This talk will present noise-robust techniques for inducing several stand-alone analysis tools in linguistically diverse foreign languages, including part-of-speech taggers and morphological analyzers, starting with no existing training data in the second language. It will also present alternative unsupervised approaches for inducing bilingual dictionaries and morphological analyzers derived from word association statistics in only monolingual text corpora. Together these techniques offer the potential to transfer existing investments in linguistic analysis and information extraction capabilities in some languages efficiently to other languages of interest.
David Yarowsky is a Professor of Computer Science at Johns Hopkins University and a member of its Center for Language and Speech Processing. His research interests include machine translation, multilingual natural language processing, and corpus-based machine learning of natural language.
Wednesday, April 27th, 2005 - 14:40-15:30, FENS 2019