[EBOOK] Joint Prediction Of Word Alignment And Alignment Types For Statistical Machine Translation PDF Download

Joint Prediction of Word Alignment and Alignment Types for Statistical Machine Translation

Book Details:

Author : Te Bu
Publisher :
Release : 2015
ISBN :
Pages : 49 pages

Download or read book Joint Prediction of Word Alignment and Alignment Types for Statistical Machine Translation written by Te Bu and published by . This book was released on 2015 with total page 49 pages. Available in PDF, EPUB and Kindle. Book excerpt: Learning word alignments between parallel sentence pairs is an important task in Statistical Machine Translation. Existing models for word alignment have assumed that word alignment links are untyped. In this work, we propose new machine learning models that use linguistically informed link types to enrich word alignments. We use 11 different alignment link types based on annotated data released by the Linguistics Data Consortium. We first provide a solution to the sub-problem of alignment type prediction given an aligned word pair and then propose two different models to simultaneously predict word alignment and alignment types. Our experimental results show that we can recover alignment link types with an F-score of 81.4%. Our joint model improves the word alignment F-score by 4.6% over a baseline that does not use typed alignment links. We expect typed word alignments to benefit SMT and other NLP tasks that rely on word alignments.

Multilingual Unsupervised Word Alignment Models and Their Application

Book Details:

Author : Anahita Mansouri Bigvand
Publisher :
Release : 2021
ISBN :
Pages : 97 pages

Download or read book Multilingual Unsupervised Word Alignment Models and Their Application written by Anahita Mansouri Bigvand and published by . This book was released on 2021 with total page 97 pages. Available in PDF, EPUB and Kindle. Book excerpt: Word alignment is an essential task in natural language processing because of its critical role in training statistical machine translation (SMT) models, error analysis for neural machine translation (NMT), building bilingual lexicon, and annotation transfer. In this thesis, we explore models for word alignment, how they can be extended to incorporate linguistically-motivated alignment types, and how they can be neuralized in an end-to-end fashion. In addition to these methodological developments, we apply our word alignment models to cross-lingual part-of-speech projection. First, we present a new probabilistic model for word alignment where word alignments are associated with linguistically-motivated alignment types. We propose a novel task of joint prediction of word alignment and alignment types and propose novel semi-supervised learning algorithms for this task. We also solve a sub-task of predicting the alignment type given an aligned word pair. The proposed joint generative models (alignment-type-enhanced models) significantly outperform the models without alignment types in terms of word alignment and translation quality. Next, we present an unsupervised neural Hidden Markov Model for word alignment, where emission and transition probabilities are modeled using neural networks. The model is simpler in structure, allows for seamless integration of additional context, and can be used in an end-to-end neural network. Finally, we tackle the part-of-speech tagging task for the zero-resource scenario where no part-of-speech (POS) annotated training data is available. We present a cross-lingual projection approach where neural HMM aligners are used to obtain high quality word alignments between resource-poor and resource-rich languages. Moreover, high quality neural POS taggers are used to provide annotations for the resource-rich language side of the parallel data, as well as to train a tagger on the projected data. Our experimental results on truly low-resource languages show that our methods outperform their corresponding baselines.

Computers

Bitext Alignment

Book Details:

Author : Jörg Tiedemann
Publisher : Morgan & Claypool Publishers
Release : 2011
ISBN : 1608455106
Pages : 168 pages

Download or read book Bitext Alignment written by Jörg Tiedemann and published by Morgan & Claypool Publishers. This book was released on 2011 with total page 168 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book provides an overview of various techniques for the alignment of bitexts. It describes general concepts and strategies that can be applied to map corresponding parts in parallel documents on various levels of granularity. Bitexts are valuable linguistic resources for many different research fields and practical applications. The most predominant application is machine translation, in particular, statistical machine translation. However, there are various other threads that can be followed which may be supported by the rich linguistic knowledge implicitly stored in parallel resources. Bitexts have been explored in lexicography, word sense disambiguation, terminology extraction, computer-aided language learning and translation studies to name just a few. The book covers the essential tasks that have to be carried out when building parallel corpora starting from the collection of translated documents up to sub-sentential alignments. In particular, it describes various approaches to document alignment, sentence alignment, word alignment and tree structure alignment. It also includes a list of resources and a comprehensive review of the literature on alignment techniques. Table of Contents: Introduction / Basic Concepts and Terminology / Building Parallel Corpora / Sentence Alignment / Word Alignment / Phrase and Tree Alignment / Concluding Remarks

The Impact of Statistical Word Alignment Quality and Structure in Phrase Based Statistical Machine Translation

Book Details:

Author : Francisco Javier Guzmán Herrera
Publisher :
Release : 2011
ISBN :
Pages : 121 pages

Download or read book The Impact of Statistical Word Alignment Quality and Structure in Phrase Based Statistical Machine Translation written by Francisco Javier Guzmán Herrera and published by . This book was released on 2011 with total page 121 pages. Available in PDF, EPUB and Kindle. Book excerpt: Statistical Word Alignments represent lexical word-to- word translations between source and target language sentences. They are considered the starting point for many state of the art Statistical Machine Translation (SMT) systems. In this dissertation, we perform an in-depth study of the impact of word alignments at different stages of the phrase-based statistical machine translation pipeline, namely word alignment, phrase extraction, phrase scoring and decoding. Moreover, we establish a multivariate prediction model for different variables of the translation model and overall translation quality using word alignment structure. Based on those models, we identify the most important alignment variables and propose two alternatives to provide more control over alignment structure and thus improve SMT. Our results show that using alignment structure into decoding, via alignment gap features yields significant improvements, specially in situations where translation data is limited.

Discriminative Alignment Models For Statistical Machine Translation

Book Details:

Author : Nadi Tomeh
Publisher :
Release : 2012
ISBN :
Pages : 0 pages

Download or read book Discriminative Alignment Models For Statistical Machine Translation written by Nadi Tomeh and published by . This book was released on 2012 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Bitext alignment is the task of aligning a text in a source language and its translation in the target language. Aligning amounts to finding the translational correspondences between textual units at different levels of granularity. Many practical natural language processing applications rely on bitext alignments to access the rich linguistic knowledge present in a bitext. While the most predominant application for bitexts is statistical machine translation, they are also used in multilingual (and monolingual) lexicography, word sense disambiguation, terminology extraction, computer-aided language learning andtranslation studies, to name a few.Bitext alignment is an arduous task because meaning is not expressed seemingly across languages. It varies along linguistic properties and cultural backgrounds of different languages, and also depends on the translation strategy that have been used to produce the bitext.Current practices in bitext alignment model the alignment as a hidden variable in the translation process. In order to reduce the complexity of the task, such approaches suppose that a word in the source sentence is aligned to one word at most in the target sentence.However, this over-simplistic assumption results in asymmetric, one-to-many alignments, whereas alignments are typically symmetric and many-to-many.To achieve symmetry, two one-to-many alignments in opposite translation directions are built and combined using a heuristic.In order to use these word alignments in phrase-based translation systems which use phrases instead of words, a heuristic is used to extract phrase pairs that are consistent with the word alignment.In this dissertation we address both the problems of word alignment and phrase pairs extraction.We improve the state of the art in several ways using discriminative learning techniques.We present a maximum entropy (MaxEnt) framework for word alignment.In this framework, links are predicted independently from one another using a MaxEnt classifier.The interaction between alignment decisions is approximated using stackingtechniques, which allows us to account for a part of the structural dependencies without increasing the complexity. This formulation can be seen as an alignment combination method,in which the union of several input alignments is used to guide the output alignment. Additionally, input alignments are used to compute a rich set of feature functions.Our MaxEnt aligner obtains state of the art results in terms of alignment quality as measured by thealignment error rate, and translation quality as measured by BLEU on large-scale Arabic-English NIST'09 systems.We also present a translation quality informed procedure for both extraction and evaluation of phrase pairs. We reformulate the problem in the supervised framework in which we decide for each phrase pair whether we keep it or not in the translation model. This offers a principled way to combine several features to make the procedure more robust to alignment difficulties. We use a simple and effective method, based on oracle decoding,to annotate phrase pairs that are useful for translation. Using machine learning techniques based on positive examples only,these annotations can be used to learn phrase alignment decisions. Using this approach we obtain improvements in BLEU scores for recall-oriented translation models, which are suitable for small training corpora.

Technology & Engineering

Electronic Systems and Intelligent Computing

Book Details:

Author : Pradeep Kumar Mallick
Publisher : Springer Nature
Release : 2022-06-02
ISBN : 9811694885
Pages : 776 pages

Download or read book Electronic Systems and Intelligent Computing written by Pradeep Kumar Mallick and published by Springer Nature. This book was released on 2022-06-02 with total page 776 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book is a compilation of contributed research work from International Conference on Electronic Systems and Intelligent Computing (ESIC 2021) and covers the areas of electronics, communication, electrical and computing. This book is specifically targeted to the students, research scholars and academician from the background of electronics, communication, electrical and computer science. Advances in electronics, communication, electrical and computing cover the different approaches and techniques for specific applications using particle swarm optimization, Otsu’s function and harmony search optimization algorithm, DNA-NAND gate, triple gate SOI MOSFET, micro-Raman and FTIR analysis, high-k dielectric gate oxide, spectrum sensing in cognitive radio, microstrip antenna, GPR with conducting surfaces, energy-efficient packet routing, iBGP route reflectors, circularly polarized antenna, double fork-shaped patch radiator, implementation of Doppler radar at 24 GHz, iris image classification using SVM, digital image forgery detection, secure communication, spoken dialog system and DFT-DCT spreading strategies.

Language Arts & Disciplines

Linguistically Motivated Statistical Machine Translation

Book Details:

Author : Deyi Xiong
Publisher : Springer
Release : 2015-02-11
ISBN : 9812873562
Pages : 159 pages

Download or read book Linguistically Motivated Statistical Machine Translation written by Deyi Xiong and published by Springer. This book was released on 2015-02-11 with total page 159 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book provides a wide variety of algorithms and models to integrate linguistic knowledge into Statistical Machine Translation (SMT). It helps advance conventional SMT to linguistically motivated SMT by enhancing the following three essential components: translation, reordering and bracketing models. It also serves the purpose of promoting the in-depth study of the impacts of linguistic knowledge on machine translation. Finally it provides a systematic introduction of Bracketing Transduction Grammar (BTG) based SMT, one of the state-of-the-art SMT formalisms, as well as a case study of linguistically motivated SMT on a BTG-based platform.

On Word Alignment Models for Statistical Machine Translation

Book Details:

Author : Shaojun Zhao
Publisher :
Release : 2011
ISBN :
Pages : 240 pages

Download or read book On Word Alignment Models for Statistical Machine Translation written by Shaojun Zhao and published by . This book was released on 2011 with total page 240 pages. Available in PDF, EPUB and Kindle. Book excerpt: "Machine translation remains the holy grail of computational linguistics. All statistical machine translation systems are built upon the idea of word alignment. While the field of word alignment has had tremendous progress in the last two decades, it is still in great need of speed and quality improvement. We designed a fertility hidden Markov model for word alignment, which is dramatically faster than the most widely used IBM Model 4. In fact, our model is even faster and has lower alignment error rate (AER) than the hidden Markov model. An experiment on Chinese-English translation shows that our word alignment model leads to better translation results than IBM Model 4, based on the BLEU metric. We also designed algorithms that mine massive and high quality bilingual texts for a variety of language pairs from the web using word alignment. The resulting data improved a state-ofthe- art machine translation system."--Leaf v.

Machine learning

A Machine Learning Approach to Word Alignment in Statistical Machine Translation

Book Details:

Author : Michael Camilleri (M.Sc.)
Publisher :
Release : 2009
ISBN :
Pages : 100 pages

Download or read book A Machine Learning Approach to Word Alignment in Statistical Machine Translation written by Michael Camilleri (M.Sc.) and published by . This book was released on 2009 with total page 100 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Constrained Word Alignment Models for Statistical Machine Translation

Book Details:

Author : Ma Yanjun
Publisher :
Release : 2009
ISBN :
Pages : 207 pages

Download or read book Constrained Word Alignment Models for Statistical Machine Translation written by Ma Yanjun and published by . This book was released on 2009 with total page 207 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Statistical Machine Translation

Book Details:

Author : Franz Josef Och
Publisher :
Release : 2002
ISBN :
Pages : 144 pages

Download or read book Statistical Machine Translation written by Franz Josef Och and published by . This book was released on 2002 with total page 144 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Computers

Statistical Machine Translation

Book Details:

Author : Philipp Koehn
Publisher : Cambridge University Press
Release : 2010
ISBN : 0521874157
Pages : 447 pages

Download or read book Statistical Machine Translation written by Philipp Koehn and published by Cambridge University Press. This book was released on 2010 with total page 447 pages. Available in PDF, EPUB and Kindle. Book excerpt: The dream of automatic language translation is now closer thanks to recent advances in the techniques that underpin statistical machine translation. This class-tested textbook from an active researcher in the field, provides a clear and careful introduction to the latest methods and explains how to build machine translation systems for any two languages. It introduces the subject's building blocks from linguistics and probability, then covers the major models for machine translation: word-based, phrase-based, and tree-based, as well as machine translation evaluation, language modeling, discriminative training and advanced methods to integrate linguistic annotation. The book also reports the latest research, presents the major outstanding challenges, and enables novices as well as experienced researchers to make novel contributions to this exciting area. Ideal for students at undergraduate and graduate level, or for anyone interested in the latest developments in machine translation.

Phrase Alignment Models for Statistical Machine Translation

Book Details:

Author : John Sturdy DeNero
Publisher :
Release : 2010
ISBN :
Pages : 210 pages

Download or read book Phrase Alignment Models for Statistical Machine Translation written by John Sturdy DeNero and published by . This book was released on 2010 with total page 210 pages. Available in PDF, EPUB and Kindle. Book excerpt: The goal of a machine translation (MT) system is to automatically translate a document written in some human input language (e.g., Mandarin Chinese) into an equivalent document written in an output language (e.g., English). This task--so simple in its specification, and yet so rich in its complexities--has challenged computer science researchers for 60 years. While MT systems are in wide use today, the problem of producing human-quality translations remains unsolved. Statistical approaches have substantially improved the quality of MT systems by effectively exploiting parallel corpora: large collections of documents that have been translated by people, and therefore naturally occur in both the input and output languages. Broadly characterized, statistical MT systems translate an input document by matching fragments of its contents to examples in a parallel corpus, and then stitching together the translations of those fragments into a coherent document in an output language. The central challenge of this approach is to distill example translations into reusable parts: fragments of sentences that we know how to translate robustly and are likely to recur. Individual words are certainly common enough to recur, but they often cannot be translated correctly in isolation. At the other extreme, whole sentences can be translated without much context, but rarely repeat, and so cannot be recycled to build new translations. This thesis focuses on acquiring translations of phrases: contiguous sequences of a few words that encapsulate enough context to be translatable, but recur frequently in large corpora. We automatically identify phrase-level translations that are contained within human-translated sentences by partitioning each sentence into phrases and aligning phrases across languages. This alignment-based approach to acquiring phrasal translations gives rise to statistical models of phrase alignment. A statistical phrase alignment model assigns a score to each possible analysis of a sentence-level translation, where an analysis describes which phrases within that sentence can be translated and how to translate them. If the model assigns a high score to a particular phrasal translation, we should be willing to reuse that translation in new sentences that contain the same phrase. Chapter 1 provides a non-technical introduction to phrase alignment models and machine translation. Chapter 2 describes a complete state-of-the-art phrase-based translation system to clarify the role of phrase alignment models. The remainder of this thesis presents a series of novel models, analyses, and experimental results that together constitute a thorough investigation of phrase alignment models for statistical machine translation. Chapter 3 presents the formal properties of the class of phrase alignment models, including inference algorithms and tractability results. We present two specific models, along with statistical learning techniques to fit their parameters to data. Our experimental evaluation identifies two primary challenges to training and employing phrase alignment models, and we address each of these in turn. The first broad challenge is that generative phrase models are structured to prefer very long, rare phrases. These models require external pressure to explain observed translations using small, reusable phrases rather than large, unique ones. Chapter 4 describes three Bayesian models and a corresponding Gibbs sampler to address this challenge. These models outperform the word-level models that are widely employed in research and production MT systems. The second broad challenge is structural: there are many consistent and coherent ways of analyzing a translated sentence using phrases. Long phrases, short phrases, and overlapping phrases can all simultaneously express correct, translatable units. However, no previous phrase alignment models have leveraged this rich structure to predict alignments. We describe a discriminative model of multi-scale, overlapping phrases that outperforms all previously proposed models. The cumulative result of this thesis is to establish model-based phrase alignment as the most effective approach to acquiring phrasal translations. Only phrase alignment models are able to incorporate statistical signals about multi-word constructions into alignment decisions and score coherent phrasal analyses of full sentence pairs. As a result, phrase alignment models outperform classical word-level models in both generative and discriminative settings. This result is fundamental to the field: the models proposed in this thesis address a general, language-independent alignment problem that arises in all state-of-the-art statistical machine translation systems in use today.

Technology & Engineering

Biologically Inspired Techniques in Many Criteria Decision Making

Book Details:

Author : Satchidananda Dehuri
Publisher : Springer Nature
Release : 2022-06-03
ISBN : 9811687390
Pages : 718 pages

Download or read book Biologically Inspired Techniques in Many Criteria Decision Making written by Satchidananda Dehuri and published by Springer Nature. This book was released on 2022-06-03 with total page 718 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book includes best-selected, high-quality research papers presented at Second International Conference on Biologically Inspired Techniques in Many Criteria Decision Making (BITMDM 2021) organized by Department of Information & Communication Technology, Fakir Mohan University, Balasore, Odisha, India, during December 20-21, 2021. This proceeding presents the recent advances in techniques which are biologically inspired and their usage in the field of many criteria decision making. The topics covered are biologically inspired algorithms, nature-inspired algorithms, multi-criteria optimization, multi-criteria decision making, data mining, big-data analysis, cloud computing, IOT, machine learning and soft computing, smart technologies, crypt-analysis, cognitive informatics, computational intelligence, artificial intelligence and machine learning, data management exploration and mining, computational intelligence, and signal and image processing.

Aligning the Foundations of Hierarchical Statistical Machine Translation

Book Details:

Author : Gideon Maillette de Buy Wenniger
Publisher :
Release : 2016
ISBN : 9789402801934
Pages : 0 pages

Download or read book Aligning the Foundations of Hierarchical Statistical Machine Translation written by Gideon Maillette de Buy Wenniger and published by . This book was released on 2016 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: "Statistical machine translation (SMT) plays an important role in the automatic translation of the large and increasing volume of documents that has become globally available. The results of SMT are often still lacking in various aspects including word order. This thesis focuses on the improvement of hierarchical SMT, in particular Hiero. Hiero rules lack nonterminal labels. This gives them little context and makes their combination into full translations poorly coordinated, and strongly dependent on the language model. In this thesis, bilingual labels are added to Hiero rules. These bilingual labels lead to more coherent translations with better word order, as demonstrated by extensive experiments on three language pairs. The proposed labels require no syntactic information, and use only the information from word alignments. This distinguishes them from various types of syntactic labels earlier proposed in the literature. Bilingual labels are based on a newly proposed framework called hierarchical alignment trees (HATs). HATs are bilingual trees that represent the hierarchical translation equivalence structure induced from word alignments. HATs maximally decompose word alignments into phrase pairs, and provide an explicit description of the local reordering taking place within each phrase pair. The last part of the thesis is concerned with the complexity of empirical translation equivalence. Given a word alignment and a grammar, it studies the question what it means for the grammar to cover the word alignment. HATs play a key role in answering this question exactly and efficiently, and are applied to characterize alignment complexity for various language pairs."--Samenvatting auteur.

Word Alignment and Smoothing Methods in Statistical Machine Translation

Book Details:

Author : Tsuyoshi Okita
Publisher :
Release : 2012
ISBN :
Pages : 133 pages

Download or read book Word Alignment and Smoothing Methods in Statistical Machine Translation written by Tsuyoshi Okita and published by . This book was released on 2012 with total page 133 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Computers

Syntax based Statistical Machine Translation

Book Details:

Author : Philip Williams
Publisher : Springer Nature
Release : 2022-05-31
ISBN : 3031021649
Pages : 190 pages

Download or read book Syntax based Statistical Machine Translation written by Philip Williams and published by Springer Nature. This book was released on 2022-05-31 with total page 190 pages. Available in PDF, EPUB and Kindle. Book excerpt: This unique book provides a comprehensive introduction to the most popular syntax-based statistical machine translation models, filling a gap in the current literature for researchers and developers in human language technologies. While phrase-based models have previously dominated the field, syntax-based approaches have proved a popular alternative, as they elegantly solve many of the shortcomings of phrase-based models. The heart of this book is a detailed introduction to decoding for syntax-based models. The book begins with an overview of synchronous-context free grammar (SCFG) and synchronous tree-substitution grammar (STSG) along with their associated statistical models. It also describes how three popular instantiations (Hiero, SAMT, and GHKM) are learned from parallel corpora. It introduces and details hypergraphs and associated general algorithms, as well as algorithms for decoding with both tree and string input. Special attention is given to efficiency, including search approximations such as beam search and cube pruning, data structures, and parsing algorithms. The book consistently highlights the strengths (and limitations) of syntax-based approaches, including their ability to generalize phrase-based translation units, their modeling of specific linguistic phenomena, and their function of structuring the search space.