EBookClubs

Read Books & Download eBooks Full Online

EBookClubs

Read Books & Download eBooks Full Online

Book Gaussian Alignments in Statistical Translation Models

Download or read book Gaussian Alignments in Statistical Translation Models written by Ali H. Mohammad and published by . This book was released on 2006 with total page 106 pages. Available in PDF, EPUB and Kindle. Book excerpt: Machine translation software has been under development almost since the birth of the electronic computer. Current state-of-the-art methods use statistical techniques to learn how to translate from one natural language to another from a corpus of hand-translated text. The success of these techniques comes from two factors: a simple statistical model and vast training data sets. The standard agenda for improving such models is to enable it to model greater complexity; however, it is a byword within the machine learning community that added complexity must be supported with more training data. Given that current models already require huge amounts of data, our agenda is instead to simplify current models before adding extensions. We present one such simplification, which results in fewer than 10% as many alignment model parameters and produces results competitive with the original model. An unexpected benefit of this technique is that it naturally gives a measure for how difficult it is to translate from one language to another given a data set. Next, we present one suggestion for adding complexity to model new behavior.

Book Alignment Models and Algorithms for Statistical Machine Translation

Download or read book Alignment Models and Algorithms for Statistical Machine Translation written by James Jonathan Jesse Brunning and published by . This book was released on 2010 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book On Word Alignment Models for Statistical Machine Translation

Download or read book On Word Alignment Models for Statistical Machine Translation written by Shaojun Zhao and published by . This book was released on 2011 with total page 240 pages. Available in PDF, EPUB and Kindle. Book excerpt: "Machine translation remains the holy grail of computational linguistics. All statistical machine translation systems are built upon the idea of word alignment. While the field of word alignment has had tremendous progress in the last two decades, it is still in great need of speed and quality improvement. We designed a fertility hidden Markov model for word alignment, which is dramatically faster than the most widely used IBM Model 4. In fact, our model is even faster and has lower alignment error rate (AER) than the hidden Markov model. An experiment on Chinese-English translation shows that our word alignment model leads to better translation results than IBM Model 4, based on the BLEU metric. We also designed algorithms that mine massive and high quality bilingual texts for a variety of language pairs from the web using word alignment. The resulting data improved a state-ofthe- art machine translation system."--Leaf v.

Book Constrained Word Alignment Models for Statistical Machine Translation

Download or read book Constrained Word Alignment Models for Statistical Machine Translation written by Ma Yanjun and published by . This book was released on 2009 with total page 207 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Discriminative Alignment Models For Statistical Machine Translation

Download or read book Discriminative Alignment Models For Statistical Machine Translation written by Nadi Tomeh and published by . This book was released on 2012 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Bitext alignment is the task of aligning a text in a source language and its translation in the target language. Aligning amounts to finding the translational correspondences between textual units at different levels of granularity. Many practical natural language processing applications rely on bitext alignments to access the rich linguistic knowledge present in a bitext. While the most predominant application for bitexts is statistical machine translation, they are also used in multilingual (and monolingual) lexicography, word sense disambiguation, terminology extraction, computer-aided language learning andtranslation studies, to name a few.Bitext alignment is an arduous task because meaning is not expressed seemingly across languages. It varies along linguistic properties and cultural backgrounds of different languages, and also depends on the translation strategy that have been used to produce the bitext.Current practices in bitext alignment model the alignment as a hidden variable in the translation process. In order to reduce the complexity of the task, such approaches suppose that a word in the source sentence is aligned to one word at most in the target sentence.However, this over-simplistic assumption results in asymmetric, one-to-many alignments, whereas alignments are typically symmetric and many-to-many.To achieve symmetry, two one-to-many alignments in opposite translation directions are built and combined using a heuristic.In order to use these word alignments in phrase-based translation systems which use phrases instead of words, a heuristic is used to extract phrase pairs that are consistent with the word alignment.In this dissertation we address both the problems of word alignment and phrase pairs extraction.We improve the state of the art in several ways using discriminative learning techniques.We present a maximum entropy (MaxEnt) framework for word alignment.In this framework, links are predicted independently from one another using a MaxEnt classifier.The interaction between alignment decisions is approximated using stackingtechniques, which allows us to account for a part of the structural dependencies without increasing the complexity. This formulation can be seen as an alignment combination method,in which the union of several input alignments is used to guide the output alignment. Additionally, input alignments are used to compute a rich set of feature functions.Our MaxEnt aligner obtains state of the art results in terms of alignment quality as measured by thealignment error rate, and translation quality as measured by BLEU on large-scale Arabic-English NIST'09 systems.We also present a translation quality informed procedure for both extraction and evaluation of phrase pairs. We reformulate the problem in the supervised framework in which we decide for each phrase pair whether we keep it or not in the translation model. This offers a principled way to combine several features to make the procedure more robust to alignment difficulties. We use a simple and effective method, based on oracle decoding,to annotate phrase pairs that are useful for translation. Using machine learning techniques based on positive examples only,these annotations can be used to learn phrase alignment decisions. Using this approach we obtain improvements in BLEU scores for recall-oriented translation models, which are suitable for small training corpora.

Book The Impact of Statistical Word Alignment Quality and Structure in Phrase Based Statistical Machine Translation

Download or read book The Impact of Statistical Word Alignment Quality and Structure in Phrase Based Statistical Machine Translation written by Francisco Javier Guzmán Herrera and published by . This book was released on 2011 with total page 121 pages. Available in PDF, EPUB and Kindle. Book excerpt: Statistical Word Alignments represent lexical word-to- word translations between source and target language sentences. They are considered the starting point for many state of the art Statistical Machine Translation (SMT) systems. In this dissertation, we perform an in-depth study of the impact of word alignments at different stages of the phrase-based statistical machine translation pipeline, namely word alignment, phrase extraction, phrase scoring and decoding. Moreover, we establish a multivariate prediction model for different variables of the translation model and overall translation quality using word alignment structure. Based on those models, we identify the most important alignment variables and propose two alternatives to provide more control over alignment structure and thus improve SMT. Our results show that using alignment structure into decoding, via alignment gap features yields significant improvements, specially in situations where translation data is limited.

Book Syntax based Statistical Machine Translation

Download or read book Syntax based Statistical Machine Translation written by Philip Williams and published by Springer Nature. This book was released on 2022-05-31 with total page 190 pages. Available in PDF, EPUB and Kindle. Book excerpt: This unique book provides a comprehensive introduction to the most popular syntax-based statistical machine translation models, filling a gap in the current literature for researchers and developers in human language technologies. While phrase-based models have previously dominated the field, syntax-based approaches have proved a popular alternative, as they elegantly solve many of the shortcomings of phrase-based models. The heart of this book is a detailed introduction to decoding for syntax-based models. The book begins with an overview of synchronous-context free grammar (SCFG) and synchronous tree-substitution grammar (STSG) along with their associated statistical models. It also describes how three popular instantiations (Hiero, SAMT, and GHKM) are learned from parallel corpora. It introduces and details hypergraphs and associated general algorithms, as well as algorithms for decoding with both tree and string input. Special attention is given to efficiency, including search approximations such as beam search and cube pruning, data structures, and parsing algorithms. The book consistently highlights the strengths (and limitations) of syntax-based approaches, including their ability to generalize phrase-based translation units, their modeling of specific linguistic phenomena, and their function of structuring the search space.

Book Joint Prediction of Word Alignment and Alignment Types for Statistical Machine Translation

Download or read book Joint Prediction of Word Alignment and Alignment Types for Statistical Machine Translation written by Te Bu and published by . This book was released on 2015 with total page 49 pages. Available in PDF, EPUB and Kindle. Book excerpt: Learning word alignments between parallel sentence pairs is an important task in Statistical Machine Translation. Existing models for word alignment have assumed that word alignment links are untyped. In this work, we propose new machine learning models that use linguistically informed link types to enrich word alignments. We use 11 different alignment link types based on annotated data released by the Linguistics Data Consortium. We first provide a solution to the sub-problem of alignment type prediction given an aligned word pair and then propose two different models to simultaneously predict word alignment and alignment types. Our experimental results show that we can recover alignment link types with an F-score of 81.4%. Our joint model improves the word alignment F-score by 4.6% over a baseline that does not use typed alignment links. We expect typed word alignments to benefit SMT and other NLP tasks that rely on word alignments.

Book Linguistically Motivated Statistical Machine Translation

Download or read book Linguistically Motivated Statistical Machine Translation written by Deyi Xiong and published by Springer. This book was released on 2015-02-11 with total page 159 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book provides a wide variety of algorithms and models to integrate linguistic knowledge into Statistical Machine Translation (SMT). It helps advance conventional SMT to linguistically motivated SMT by enhancing the following three essential components: translation, reordering and bracketing models. It also serves the purpose of promoting the in-depth study of the impacts of linguistic knowledge on machine translation. Finally it provides a systematic introduction of Bracketing Transduction Grammar (BTG) based SMT, one of the state-of-the-art SMT formalisms, as well as a case study of linguistically motivated SMT on a BTG-based platform.

Book Grammar Inference and Statistical Machine Translation

Download or read book Grammar Inference and Statistical Machine Translation written by Ye-Yi Wang and published by . This book was released on 1998 with total page 137 pages. Available in PDF, EPUB and Kindle. Book excerpt: Abstract: "NLP researchers face a dilemma: on one side, it is unarguably accepted that languages have internal structure rather than strings of words. On the other side, they find it very difficult and expensive to write grammars that have good coverage of language structures. Statistical machine translation tries to cope with this problem by ignoring language structures and using a statistical models [sic] to depict the translation process. Most of the translation models are word-based. While the approach has achieved surprisingly good performance comparable to the best commercial systems, many questions remain in the machine translation community. Can the statistical word-based translation still perform well on language pairs with radically different linguistic structures? How would it function with less training data or with spoken languages? The thesis work investigated these questions. In summary, word-based alignment model is a major cause of errors in German-English statistical spoken language translation. To account for this problem, a structure-based alignment model is introduced. This new model takes advantages of a bilingual grammar inference algorithm, which can automatically acquire shallow phrase structures used by the model. The structure-based model can directly depict the structure difference between English and German spoken languages. It also results in focused learning of word alignment, therefore it can alleviate the sparse data problem. The structure-based model achieved 11 percent error reduction over the state-of-the-art statistical machine translation models."

Book Statistical Machine Translation

Download or read book Statistical Machine Translation written by Philipp Koehn and published by Cambridge University Press. This book was released on 2009-12-17 with total page 447 pages. Available in PDF, EPUB and Kindle. Book excerpt: The dream of automatic language translation is now closer thanks to recent advances in the techniques that underpin statistical machine translation. This class-tested textbook from an active researcher in the field, provides a clear and careful introduction to the latest methods and explains how to build machine translation systems for any two languages. It introduces the subject's building blocks from linguistics and probability, then covers the major models for machine translation: word-based, phrase-based, and tree-based, as well as machine translation evaluation, language modeling, discriminative training and advanced methods to integrate linguistic annotation. The book also reports the latest research, presents the major outstanding challenges, and enables novices as well as experienced researchers to make novel contributions to this exciting area. Ideal for students at undergraduate and graduate level, or for anyone interested in the latest developments in machine translation.

Book Information Retrieval Technology

Download or read book Information Retrieval Technology written by Mohamed Vall Mohamed Salem and published by Springer. This book was released on 2011-12-14 with total page 639 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book constitutes the refereed proceedings of the 7th Asia Information Retrieval Societies Conference AIRS 2011, held in Dubai, United Arab Emirates, in December 2011. The 31 revised full papers and 25 revised poster papers presented were carefully reviewed and selected from 132 submissions. All current aspects of information retrieval - in theory and practice - are addressed; the papers are organized in topical sections on information retrieval models and theories; information retrieval applications and multimedia information retrieval; user study, information retrieval evaluation and interactive information retrieval; Web information retrieval, scalability and adversarial information retrieval; machine learning for information retrieval; natural language processing for information retrieval; arabic script text processing and retrieval.

Book Improving Statistical Alignment and Translation Using Highly Multilingual Corpora

Download or read book Improving Statistical Alignment and Translation Using Highly Multilingual Corpora written by Camelia Ignat and published by . This book was released on 2009 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Pattern Recognition and Image Analysis

Download or read book Pattern Recognition and Image Analysis written by Jordi Vitria and published by Springer Science & Business Media. This book was released on 2011-06-01 with total page 773 pages. Available in PDF, EPUB and Kindle. Book excerpt: This volume constitutes the refereed proceedings of the 5th Iberian Conference on Pattern Recognition and Image Analysis, IbPRIA 2011, held in Las Palmas de Gran Canaria, Spain, in June 2011. The 34 revised full papers and 58 revised poster papers presented were carefully reviewed and selected from 158 submissions. The papers are organized in topical sections on computer vision; image processing and analysis; medical applications; and pattern recognition.

Book Multilingual Unsupervised Word Alignment Models and Their Application

Download or read book Multilingual Unsupervised Word Alignment Models and Their Application written by Anahita Mansouri Bigvand and published by . This book was released on 2021 with total page 97 pages. Available in PDF, EPUB and Kindle. Book excerpt: Word alignment is an essential task in natural language processing because of its critical role in training statistical machine translation (SMT) models, error analysis for neural machine translation (NMT), building bilingual lexicon, and annotation transfer. In this thesis, we explore models for word alignment, how they can be extended to incorporate linguistically-motivated alignment types, and how they can be neuralized in an end-to-end fashion. In addition to these methodological developments, we apply our word alignment models to cross-lingual part-of-speech projection. First, we present a new probabilistic model for word alignment where word alignments are associated with linguistically-motivated alignment types. We propose a novel task of joint prediction of word alignment and alignment types and propose novel semi-supervised learning algorithms for this task. We also solve a sub-task of predicting the alignment type given an aligned word pair. The proposed joint generative models (alignment-type-enhanced models) significantly outperform the models without alignment types in terms of word alignment and translation quality. Next, we present an unsupervised neural Hidden Markov Model for word alignment, where emission and transition probabilities are modeled using neural networks. The model is simpler in structure, allows for seamless integration of additional context, and can be used in an end-to-end neural network. Finally, we tackle the part-of-speech tagging task for the zero-resource scenario where no part-of-speech (POS) annotated training data is available. We present a cross-lingual projection approach where neural HMM aligners are used to obtain high quality word alignments between resource-poor and resource-rich languages. Moreover, high quality neural POS taggers are used to provide annotations for the resource-rich language side of the parallel data, as well as to train a tagger on the projected data. Our experimental results on truly low-resource languages show that our methods outperform their corresponding baselines.

Book Reordering Metrics for Statistical Machine Translation

Download or read book Reordering Metrics for Statistical Machine Translation written by Alexandra Birch and published by . This book was released on 2011 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Natural languages display a great variety of different word orders, and one of the major challenges facing statistical machine translation is in modelling these differences. This thesis is motivated by a survey of 110 different language pairs drawn from the Europarl project, which shows that word order differences account for more variation in translation performance than any other factor. This wide ranging analysis provides compelling evidence for the importance of research into reordering. There has already been a great deal of research into improving the quality of the word order in machine translation output. However, there has been very little analysis of how best to evaluate this research. Current machine translation metrics are largely focused on evaluating the words used in translations, and their ability to measure the quality of word order has not been demonstrated. In this thesis we introduce novel metrics for quantitatively evaluating reordering. Our approach isolates the word order in translations by using word alignments. We reduce alignment information to permutations and apply standard distance metrics to compare the word order in the reference to that of the translation. We show that our metrics correlate more strongly with human judgements of word order quality than current machine translation metrics. We also show that a combined lexical and reordering metric, the LRscore, is useful for training translation model parameters. Humans prefer the output of models trained using the LRscore as the objective function, over those trained with the de facto standard translation metric, the BLEU score. The LRscore thus provides researchers with a reliable metric for evaluating the impact of their research on the quality of word order.