EBookClubs

Read Books & Download eBooks Full Online

EBookClubs

Read Books & Download eBooks Full Online

Book The Impact of Statistical Word Alignment Quality and Structure in Phrase Based Statistical Machine Translation

Download or read book The Impact of Statistical Word Alignment Quality and Structure in Phrase Based Statistical Machine Translation written by Francisco Javier Guzmán Herrera and published by . This book was released on 2011 with total page 121 pages. Available in PDF, EPUB and Kindle. Book excerpt: Statistical Word Alignments represent lexical word-to- word translations between source and target language sentences. They are considered the starting point for many state of the art Statistical Machine Translation (SMT) systems. In this dissertation, we perform an in-depth study of the impact of word alignments at different stages of the phrase-based statistical machine translation pipeline, namely word alignment, phrase extraction, phrase scoring and decoding. Moreover, we establish a multivariate prediction model for different variables of the translation model and overall translation quality using word alignment structure. Based on those models, we identify the most important alignment variables and propose two alternatives to provide more control over alignment structure and thus improve SMT. Our results show that using alignment structure into decoding, via alignment gap features yields significant improvements, specially in situations where translation data is limited.

Book Syntax based Statistical Machine Translation

Download or read book Syntax based Statistical Machine Translation written by Philip Williams and published by Springer Nature. This book was released on 2022-05-31 with total page 190 pages. Available in PDF, EPUB and Kindle. Book excerpt: This unique book provides a comprehensive introduction to the most popular syntax-based statistical machine translation models, filling a gap in the current literature for researchers and developers in human language technologies. While phrase-based models have previously dominated the field, syntax-based approaches have proved a popular alternative, as they elegantly solve many of the shortcomings of phrase-based models. The heart of this book is a detailed introduction to decoding for syntax-based models. The book begins with an overview of synchronous-context free grammar (SCFG) and synchronous tree-substitution grammar (STSG) along with their associated statistical models. It also describes how three popular instantiations (Hiero, SAMT, and GHKM) are learned from parallel corpora. It introduces and details hypergraphs and associated general algorithms, as well as algorithms for decoding with both tree and string input. Special attention is given to efficiency, including search approximations such as beam search and cube pruning, data structures, and parsing algorithms. The book consistently highlights the strengths (and limitations) of syntax-based approaches, including their ability to generalize phrase-based translation units, their modeling of specific linguistic phenomena, and their function of structuring the search space.

Book Phrase Alignment Models for Statistical Machine Translation

Download or read book Phrase Alignment Models for Statistical Machine Translation written by John Sturdy DeNero and published by . This book was released on 2010 with total page 210 pages. Available in PDF, EPUB and Kindle. Book excerpt: The goal of a machine translation (MT) system is to automatically translate a document written in some human input language (e.g., Mandarin Chinese) into an equivalent document written in an output language (e.g., English). This task--so simple in its specification, and yet so rich in its complexities--has challenged computer science researchers for 60 years. While MT systems are in wide use today, the problem of producing human-quality translations remains unsolved. Statistical approaches have substantially improved the quality of MT systems by effectively exploiting parallel corpora: large collections of documents that have been translated by people, and therefore naturally occur in both the input and output languages. Broadly characterized, statistical MT systems translate an input document by matching fragments of its contents to examples in a parallel corpus, and then stitching together the translations of those fragments into a coherent document in an output language. The central challenge of this approach is to distill example translations into reusable parts: fragments of sentences that we know how to translate robustly and are likely to recur. Individual words are certainly common enough to recur, but they often cannot be translated correctly in isolation. At the other extreme, whole sentences can be translated without much context, but rarely repeat, and so cannot be recycled to build new translations. This thesis focuses on acquiring translations of phrases: contiguous sequences of a few words that encapsulate enough context to be translatable, but recur frequently in large corpora. We automatically identify phrase-level translations that are contained within human-translated sentences by partitioning each sentence into phrases and aligning phrases across languages. This alignment-based approach to acquiring phrasal translations gives rise to statistical models of phrase alignment. A statistical phrase alignment model assigns a score to each possible analysis of a sentence-level translation, where an analysis describes which phrases within that sentence can be translated and how to translate them. If the model assigns a high score to a particular phrasal translation, we should be willing to reuse that translation in new sentences that contain the same phrase. Chapter 1 provides a non-technical introduction to phrase alignment models and machine translation. Chapter 2 describes a complete state-of-the-art phrase-based translation system to clarify the role of phrase alignment models. The remainder of this thesis presents a series of novel models, analyses, and experimental results that together constitute a thorough investigation of phrase alignment models for statistical machine translation. Chapter 3 presents the formal properties of the class of phrase alignment models, including inference algorithms and tractability results. We present two specific models, along with statistical learning techniques to fit their parameters to data. Our experimental evaluation identifies two primary challenges to training and employing phrase alignment models, and we address each of these in turn. The first broad challenge is that generative phrase models are structured to prefer very long, rare phrases. These models require external pressure to explain observed translations using small, reusable phrases rather than large, unique ones. Chapter 4 describes three Bayesian models and a corresponding Gibbs sampler to address this challenge. These models outperform the word-level models that are widely employed in research and production MT systems. The second broad challenge is structural: there are many consistent and coherent ways of analyzing a translated sentence using phrases. Long phrases, short phrases, and overlapping phrases can all simultaneously express correct, translatable units. However, no previous phrase alignment models have leveraged this rich structure to predict alignments. We describe a discriminative model of multi-scale, overlapping phrases that outperforms all previously proposed models. The cumulative result of this thesis is to establish model-based phrase alignment as the most effective approach to acquiring phrasal translations. Only phrase alignment models are able to incorporate statistical signals about multi-word constructions into alignment decisions and score coherent phrasal analyses of full sentence pairs. As a result, phrase alignment models outperform classical word-level models in both generative and discriminative settings. This result is fundamental to the field: the models proposed in this thesis address a general, language-independent alignment problem that arises in all state-of-the-art statistical machine translation systems in use today.

Book Bitext Alignment

    Book Details:
  • Author : Jörg Tiedemann
  • Publisher : Morgan & Claypool Publishers
  • Release : 2011-05-05
  • ISBN : 1608455114
  • Pages : 167 pages

Download or read book Bitext Alignment written by Jörg Tiedemann and published by Morgan & Claypool Publishers. This book was released on 2011-05-05 with total page 167 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book provides an overview of various techniques for the alignment of bitexts. It describes general concepts and strategies that can be applied to map corresponding parts in parallel documents on various levels of granularity. Bitexts are valuable linguistic resources for many different research fields and practical applications. The most predominant application is machine translation, in particular, statistical machine translation. However, there are various other threads that can be followed which may be supported by the rich linguistic knowledge implicitly stored in parallel resources. Bitexts have been explored in lexicography, word sense disambiguation, terminology extraction, computer-aided language learning and translation studies to name just a few. The book covers the essential tasks that have to be carried out when building parallel corpora starting from the collection of translated documents up to sub-sentential alignments. In particular, it describes various approaches to document alignment, sentence alignment, word alignment and tree structure alignment. It also includes a list of resources and a comprehensive review of the literature on alignment techniques. Table of Contents: Introduction / Basic Concepts and Terminology / Building Parallel Corpora / Sentence Alignment / Word Alignment / Phrase and Tree Alignment / Concluding Remarks

Book Linguistically Motivated Statistical Machine Translation

Download or read book Linguistically Motivated Statistical Machine Translation written by Deyi Xiong and published by Springer. This book was released on 2015-02-11 with total page 159 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book provides a wide variety of algorithms and models to integrate linguistic knowledge into Statistical Machine Translation (SMT). It helps advance conventional SMT to linguistically motivated SMT by enhancing the following three essential components: translation, reordering and bracketing models. It also serves the purpose of promoting the in-depth study of the impacts of linguistic knowledge on machine translation. Finally it provides a systematic introduction of Bracketing Transduction Grammar (BTG) based SMT, one of the state-of-the-art SMT formalisms, as well as a case study of linguistically motivated SMT on a BTG-based platform.

Book Discriminative Alignment Models For Statistical Machine Translation

Download or read book Discriminative Alignment Models For Statistical Machine Translation written by Nadi Tomeh and published by . This book was released on 2012 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Bitext alignment is the task of aligning a text in a source language and its translation in the target language. Aligning amounts to finding the translational correspondences between textual units at different levels of granularity. Many practical natural language processing applications rely on bitext alignments to access the rich linguistic knowledge present in a bitext. While the most predominant application for bitexts is statistical machine translation, they are also used in multilingual (and monolingual) lexicography, word sense disambiguation, terminology extraction, computer-aided language learning andtranslation studies, to name a few.Bitext alignment is an arduous task because meaning is not expressed seemingly across languages. It varies along linguistic properties and cultural backgrounds of different languages, and also depends on the translation strategy that have been used to produce the bitext.Current practices in bitext alignment model the alignment as a hidden variable in the translation process. In order to reduce the complexity of the task, such approaches suppose that a word in the source sentence is aligned to one word at most in the target sentence.However, this over-simplistic assumption results in asymmetric, one-to-many alignments, whereas alignments are typically symmetric and many-to-many.To achieve symmetry, two one-to-many alignments in opposite translation directions are built and combined using a heuristic.In order to use these word alignments in phrase-based translation systems which use phrases instead of words, a heuristic is used to extract phrase pairs that are consistent with the word alignment.In this dissertation we address both the problems of word alignment and phrase pairs extraction.We improve the state of the art in several ways using discriminative learning techniques.We present a maximum entropy (MaxEnt) framework for word alignment.In this framework, links are predicted independently from one another using a MaxEnt classifier.The interaction between alignment decisions is approximated using stackingtechniques, which allows us to account for a part of the structural dependencies without increasing the complexity. This formulation can be seen as an alignment combination method,in which the union of several input alignments is used to guide the output alignment. Additionally, input alignments are used to compute a rich set of feature functions.Our MaxEnt aligner obtains state of the art results in terms of alignment quality as measured by thealignment error rate, and translation quality as measured by BLEU on large-scale Arabic-English NIST'09 systems.We also present a translation quality informed procedure for both extraction and evaluation of phrase pairs. We reformulate the problem in the supervised framework in which we decide for each phrase pair whether we keep it or not in the translation model. This offers a principled way to combine several features to make the procedure more robust to alignment difficulties. We use a simple and effective method, based on oracle decoding,to annotate phrase pairs that are useful for translation. Using machine learning techniques based on positive examples only,these annotations can be used to learn phrase alignment decisions. Using this approach we obtain improvements in BLEU scores for recall-oriented translation models, which are suitable for small training corpora.

Book Statistical Machine Translation

Download or read book Statistical Machine Translation written by Philipp Koehn and published by Cambridge University Press. This book was released on 2010 with total page 447 pages. Available in PDF, EPUB and Kindle. Book excerpt: The dream of automatic language translation is now closer thanks to recent advances in the techniques that underpin statistical machine translation. This class-tested textbook from an active researcher in the field, provides a clear and careful introduction to the latest methods and explains how to build machine translation systems for any two languages. It introduces the subject's building blocks from linguistics and probability, then covers the major models for machine translation: word-based, phrase-based, and tree-based, as well as machine translation evaluation, language modeling, discriminative training and advanced methods to integrate linguistic annotation. The book also reports the latest research, presents the major outstanding challenges, and enables novices as well as experienced researchers to make novel contributions to this exciting area. Ideal for students at undergraduate and graduate level, or for anyone interested in the latest developments in machine translation.

Book On Word Alignment Models for Statistical Machine Translation

Download or read book On Word Alignment Models for Statistical Machine Translation written by Shaojun Zhao and published by . This book was released on 2011 with total page 240 pages. Available in PDF, EPUB and Kindle. Book excerpt: "Machine translation remains the holy grail of computational linguistics. All statistical machine translation systems are built upon the idea of word alignment. While the field of word alignment has had tremendous progress in the last two decades, it is still in great need of speed and quality improvement. We designed a fertility hidden Markov model for word alignment, which is dramatically faster than the most widely used IBM Model 4. In fact, our model is even faster and has lower alignment error rate (AER) than the hidden Markov model. An experiment on Chinese-English translation shows that our word alignment model leads to better translation results than IBM Model 4, based on the BLEU metric. We also designed algorithms that mine massive and high quality bilingual texts for a variety of language pairs from the web using word alignment. The resulting data improved a state-ofthe- art machine translation system."--Leaf v.

Book Grammar Inference and Statistical Machine Translation

Download or read book Grammar Inference and Statistical Machine Translation written by Ye-Yi Wang and published by . This book was released on 1998 with total page 137 pages. Available in PDF, EPUB and Kindle. Book excerpt: Abstract: "NLP researchers face a dilemma: on one side, it is unarguably accepted that languages have internal structure rather than strings of words. On the other side, they find it very difficult and expensive to write grammars that have good coverage of language structures. Statistical machine translation tries to cope with this problem by ignoring language structures and using a statistical models [sic] to depict the translation process. Most of the translation models are word-based. While the approach has achieved surprisingly good performance comparable to the best commercial systems, many questions remain in the machine translation community. Can the statistical word-based translation still perform well on language pairs with radically different linguistic structures? How would it function with less training data or with spoken languages? The thesis work investigated these questions. In summary, word-based alignment model is a major cause of errors in German-English statistical spoken language translation. To account for this problem, a structure-based alignment model is introduced. This new model takes advantages of a bilingual grammar inference algorithm, which can automatically acquire shallow phrase structures used by the model. The structure-based model can directly depict the structure difference between English and German spoken languages. It also results in focused learning of word alignment, therefore it can alleviate the sparse data problem. The structure-based model achieved 11 percent error reduction over the state-of-the-art statistical machine translation models."

Book Quality Estimation for Machine Translation

Download or read book Quality Estimation for Machine Translation written by Lucia Specia and published by Springer Nature. This book was released on 2022-05-31 with total page 148 pages. Available in PDF, EPUB and Kindle. Book excerpt: Many applications within natural language processing involve performing text-to-text transformations, i.e., given a text in natural language as input, systems are required to produce a version of this text (e.g., a translation), also in natural language, as output. Automatically evaluating the output of such systems is an important component in developing text-to-text applications. Two approaches have been proposed for this problem: (i) to compare the system outputs against one or more reference outputs using string matching-based evaluation metrics and (ii) to build models based on human feedback to predict the quality of system outputs without reference texts. Despite their popularity, reference-based evaluation metrics are faced with the challenge that multiple good (and bad) quality outputs can be produced by text-to-text approaches for the same input. This variation is very hard to capture, even with multiple reference texts. In addition, reference-based metrics cannot be used in production (e.g., online machine translation systems), when systems are expected to produce outputs for any unseen input. In this book, we focus on the second set of metrics, so-called Quality Estimation (QE) metrics, where the goal is to provide an estimate on how good or reliable the texts produced by an application are without access to gold-standard outputs. QE enables different types of evaluation that can target different types of users and applications. Machine learning techniques are used to build QE models with various types of quality labels and explicit features or learnt representations, which can then predict the quality of unseen system outputs. This book describes the topic of QE for text-to-text applications, covering quality labels, features, algorithms, evaluation, uses, and state-of-the-art approaches. It focuses on machine translation as application, since this represents most of the QE work done to date. It also briefly describes QE for several other applications, including text simplification, text summarization, grammatical error correction, and natural language generation.

Book Aligning the Foundations of Hierarchical Statistical Machine Translation

Download or read book Aligning the Foundations of Hierarchical Statistical Machine Translation written by Gideon Maillette de Buy Wenniger and published by . This book was released on 2016 with total page 269 pages. Available in PDF, EPUB and Kindle. Book excerpt: "Statistical machine translation (SMT) plays an important role in the automatic translation of the large and increasing volume of documents that has become globally available. The results of SMT are often still lacking in various aspects including word order. This thesis focuses on the improvement of hierarchical SMT, in particular Hiero. Hiero rules lack nonterminal labels. This gives them little context and makes their combination into full translations poorly coordinated, and strongly dependent on the language model. In this thesis, bilingual labels are added to Hiero rules. These bilingual labels lead to more coherent translations with better word order, as demonstrated by extensive experiments on three language pairs. The proposed labels require no syntactic information, and use only the information from word alignments. This distinguishes them from various types of syntactic labels earlier proposed in the literature. Bilingual labels are based on a newly proposed framework called hierarchical alignment trees (HATs). HATs are bilingual trees that represent the hierarchical translation equivalence structure induced from word alignments. HATs maximally decompose word alignments into phrase pairs, and provide an explicit description of the local reordering taking place within each phrase pair. The last part of the thesis is concerned with the complexity of empirical translation equivalence. Given a word alignment and a grammar, it studies the question what it means for the grammar to cover the word alignment. HATs play a key role in answering this question exactly and efficiently, and are applied to characterize alignment complexity for various language pairs."--Samenvatting auteur.

Book Proceedings of the 4th International Conference on Big Data Analytics for Cyber Physical System in Smart City   Volume 2

Download or read book Proceedings of the 4th International Conference on Big Data Analytics for Cyber Physical System in Smart City Volume 2 written by Mohammed Atiquzzaman and published by Springer Nature. This book was released on 2023-03-31 with total page 749 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book gathers a selection of peer-reviewed papers presented at the 4th Big Data Analytics for Cyber-Physical System in Smart City (BDCPS 2022) conference held in Bangkok, Thailand, on December 16–17. The contributions, prepared by an international team of scientists and engineers, cover the latest advances and challenges made in the field of big data analytics methods and approaches for the data-driven co-design of communication, computing, and control for smart cities. Given its scope, it offers a valuable resource for all researchers and professionals interested in big data, smart cities, and cyber-physical systems.

Book Reordering Metrics for Statistical Machine Translation

Download or read book Reordering Metrics for Statistical Machine Translation written by Alexandra Birch and published by . This book was released on 2011 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Natural languages display a great variety of different word orders, and one of the major challenges facing statistical machine translation is in modelling these differences. This thesis is motivated by a survey of 110 different language pairs drawn from the Europarl project, which shows that word order differences account for more variation in translation performance than any other factor. This wide ranging analysis provides compelling evidence for the importance of research into reordering. There has already been a great deal of research into improving the quality of the word order in machine translation output. However, there has been very little analysis of how best to evaluate this research. Current machine translation metrics are largely focused on evaluating the words used in translations, and their ability to measure the quality of word order has not been demonstrated. In this thesis we introduce novel metrics for quantitatively evaluating reordering. Our approach isolates the word order in translations by using word alignments. We reduce alignment information to permutations and apply standard distance metrics to compare the word order in the reference to that of the translation. We show that our metrics correlate more strongly with human judgements of word order quality than current machine translation metrics. We also show that a combined lexical and reordering metric, the LRscore, is useful for training translation model parameters. Humans prefer the output of models trained using the LRscore as the objective function, over those trained with the de facto standard translation metric, the BLEU score. The LRscore thus provides researchers with a reliable metric for evaluating the impact of their research on the quality of word order.

Book Syntax based Language Models for Statistical Machine Translation

Download or read book Syntax based Language Models for Statistical Machine Translation written by Matt Post and published by . This book was released on 2010 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: "The goal of machine translation is to develop algorithms that produce human-quality translations of natural language sentences. The evaluation of machine translation quality is split broadly into two aspects: adequacy and fluency. Adequacy measures how faithfully the meaning of the original sentence is preserved, whereas fluency measures whether this meaning is expressed in valid sentences in the target language. While both of these criteria are difficult to meet, fluency is a much more difficult goal. Generally, this likely has something to do with the asymmetrical nature of producing and understanding sentences; although humans are quite robust at inferring the meaning of text even in the presence of lots of noise and error, the rules that govern grammatical utterances are exacting, subtle, and elusive. To produce understandable text, we can rely on this robust processing hardware, but to produce grammatical text, we have to understand how it works. This dissertation attempts to improve the fluency of machine translation output by explicitly incorporating models of the target language structure into machine translation systems. It is organized into three parts. First, we propose a framework for decoding that decouples the structures of the sentences of the source and target languages, and evaluate it with existing grammatical models as language models for machine translation. Next, we apply lessons from that task to the learning of grammars more suitable to the demands of the machine translation. We then incorporate these grammars, called Tree Substitution Grammars, into our decoding framework.--Leaf vi

Book Statistical Phrase Based Translation

Download or read book Statistical Phrase Based Translation written by and published by . This book was released on 2003 with total page 8 pages. Available in PDF, EPUB and Kindle. Book excerpt: We propose a new phrase-based translation model and decoding algorithm that enables us to evaluate and compare several, previously proposed phrase-based translation models. Within our framework, we carry out a large number of experiments to understand better and explain why phase-based models out-performed word-based models. Our empirical results, which hold for all examined language pairs, suggest that the highest levels of performance can be obtained through relatively simple means: heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translations. Surprisingly, learning phrases longer than three words and learning phrases from high-accuracy word-level alignment models does not have a strong impact on performance. Learning only syntactically motivated phrases degrades the performance of our systems.

Book Statistical Machine Translation of English Text to API Code Usages

Download or read book Statistical Machine Translation of English Text to API Code Usages written by Dharani Kumar Palani and published by . This book was released on 2018 with total page 84 pages. Available in PDF, EPUB and Kindle. Book excerpt: Statistical Machine Translation (SMT) has gained enormous popularity in recent years as natural language translations have become increasingly accurate. In this thesis we apply SMT techniques in the context of translating English descriptions of programming tasks to source code. We evaluate four existing approaches: maximum likelihood word maps, ContextualExpansion, phrase-based, and neural network translation. As a training and test (i.e. reference translation) data set we clean and align the popular developer discussion forum StackOverflow. Our baseline approach, WordMapK, uses a simple maximum likelihood word map model which is then ordered using existing code usage graphs. The approach is quite effective, with a precision and recall of 20 and 50, respectively. Adding context to the word map model, ContextualExpansion, is able to increase the precision to 25 with a recall of 40. The traditional phrase-based translation model, Moses, achieves a similar precision and recall also incorporating the context of the input text by mapping English sequences to code sequences. The final approach is neural network translation, OpenNMT. While the median precision is 100 the recall is only 20. When manually examining the output of the neural translation, the code usages are very small and obvious. Our results represent an application of existing natural language strategies in the context of software engineering. We make our scripts, corpus, and reference translations in the hope that future work will adapt these techniques to further increase the quality of English to code statistical machine translation.