[EBOOK] Statistical Post Editing And Quality Estimation For Machine Translation Systems PDF Download

Computers

Quality Estimation for Machine Translation

Book Details:

Author : Lucia Specia
Publisher : Springer Nature
Release : 2022-05-31
ISBN : 3031021681
Pages : 148 pages

Download or read book Quality Estimation for Machine Translation written by Lucia Specia and published by Springer Nature. This book was released on 2022-05-31 with total page 148 pages. Available in PDF, EPUB and Kindle. Book excerpt: Many applications within natural language processing involve performing text-to-text transformations, i.e., given a text in natural language as input, systems are required to produce a version of this text (e.g., a translation), also in natural language, as output. Automatically evaluating the output of such systems is an important component in developing text-to-text applications. Two approaches have been proposed for this problem: (i) to compare the system outputs against one or more reference outputs using string matching-based evaluation metrics and (ii) to build models based on human feedback to predict the quality of system outputs without reference texts. Despite their popularity, reference-based evaluation metrics are faced with the challenge that multiple good (and bad) quality outputs can be produced by text-to-text approaches for the same input. This variation is very hard to capture, even with multiple reference texts. In addition, reference-based metrics cannot be used in production (e.g., online machine translation systems), when systems are expected to produce outputs for any unseen input. In this book, we focus on the second set of metrics, so-called Quality Estimation (QE) metrics, where the goal is to provide an estimate on how good or reliable the texts produced by an application are without access to gold-standard outputs. QE enables different types of evaluation that can target different types of users and applications. Machine learning techniques are used to build QE models with various types of quality labels and explicit features or learnt representations, which can then predict the quality of unseen system outputs. This book describes the topic of QE for text-to-text applications, covering quality labels, features, algorithms, evaluation, uses, and state-of-the-art approaches. It focuses on machine translation as application, since this represents most of the QE work done to date. It also briefly describes QE for several other applications, including text simplification, text summarization, grammatical error correction, and natural language generation.

Statistical Post editing and Quality Estimation for Machine Translation Systems

Book Details:

Author : Hanna Bechara
Publisher :
Release : 2014
ISBN :
Pages : 72 pages

Download or read book Statistical Post editing and Quality Estimation for Machine Translation Systems written by Hanna Bechara and published by . This book was released on 2014 with total page 72 pages. Available in PDF, EPUB and Kindle. Book excerpt: Statistical post-editing (SPE) has been successfully applied to RBMT systems and, to a less successful extent, to some SMT systems. This thesis investigates the impact of SPE on SMT systems. We apply SPE to an SMT system using a new context-modelling approach to preserve some aspects of source information in the second stage translation. This technique yields mixed results, but fails to consistently improve the output over the baseline. Furthermore, we compared the results to those of an RBMT+SPE system and a pure SMT system, using both automatic and human evaluation methods. Results show that while automatic evaluation metrics favour a pure SMT system, manual evaluators prefer the output provided by the combined RBMT+SPE system. We investigate the use machine learning methods to predict which sentences would benefit from post-editing, however, as the oracle score for both SMT and SMT+SPE was not much higher than the two systems alone, we decided to compare two systems that had a higher upper bound. Combining our analysis with machine learning techniques for quality estimation, we are able to improve the overall output by automatically selecting the best sentences from each of the SMT and RBMT+SPE systems.

Computers

Post editing of Machine Translation

Book Details:

Author : Laura Winther Balling
Publisher : Cambridge Scholars Publishing
Release : 2014-03-17
ISBN : 1443857971
Pages : 335 pages

Download or read book Post editing of Machine Translation written by Laura Winther Balling and published by Cambridge Scholars Publishing. This book was released on 2014-03-17 with total page 335 pages. Available in PDF, EPUB and Kindle. Book excerpt: Post-editing is possibly the oldest form of human-machine cooperation for translation. It has been a common practice for just about as long as operational machine translation systems have existed. Recently, however, there has been a surge of interest in post-editing among the wider user community, partly due to the increasing quality of machine translation output, but also to the availability of free, reliable software for both machine translation and post-editing. As a result, the practices and processes of the translation industry are changing in fundamental ways. This volume is a compilation of work by researchers, developers and practitioners of post-editing, presented at two recent events on post-editing: The first Workshop on Post-editing Technology and Practice, held in conjunction with the 10th Conference of the Association for Machine Translation in the Americas, held in San Diego, in 2012; and the International Workshop on Expertise in Translation and Post-editing Research and Application, held at the Copenhagen Business School, in 2012.

Word Confidence Estimation and Its Applications in Statistical Machine Translation

Book Details:

Author : Ngoc Quang Luong
Publisher :
Release : 2014
ISBN :
Pages : 0 pages

Download or read book Word Confidence Estimation and Its Applications in Statistical Machine Translation written by Ngoc Quang Luong and published by . This book was released on 2014 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Machine Translation (MT) systems, which generate automatically the translation of a target language for each source sentence, have achieved impressive gains during the recent decades and are now becoming the effective language assistances for the entire community in a globalized world. Nonetheless, due to various factors, MT quality is still not perfect in general, and the end users therefore expect to know how much should they trust a specific translation. Building a method that is capable of pointing out the correct parts, detecting the translation errors and concluding the overall quality of each MT hypothesis is definitely beneficial for not only the end users, but also for the translators, post-editors, and MT systems themselves. Such method is widely known under the name Confidence Estimation (CE) or Quality Estimation (QE). The motivations of building such automatic estimation methods originate from the actual drawbacks of assessing manually the MT quality: this task is time consuming, effort costly, and sometimes impossible in case where the readers have little or no knowledge of the source language. This thesis mostly focuses on the CE methods at word level (WCE). The WCE classifier tags each word in the MT output a quality label. The WCE working mechanism is straightforward: a classifier trained beforehand by a number of features using ML methods computes the confidence score of each label for each MT output word, then tag this word with highest score label. Nowadays, WCE shows an increasing importance in many aspects of MT. Firstly, it assists the post-editors to quickly identify the translation errors, hence improve their productivity. Secondly, it informs readers of portions of sentence that are not reliable to avoid the misunderstanding about the sentence's content. Thirdly, it selects the best translation among options from multiple MT systems. Last but not least, WCE scores can help to improve the MT quality via some scenarios: N-best list re-ranking, Search Graph Re-decoding, etc. In this thesis, we aim at building and optimizing our baseline WCE system, then exploiting it to improve MT and Sentence Confidence Estimation (SCE). Compare to the previous approaches, our novel contributions spread of these following main points. Firstly, we integrate various types of prediction indicators: system-based features extracted from the MT system, together with lexical, syntactic and semantic features to build the baseline WCE systems. We also apply multiple Machine Learning (ML) models on the entire feature set and then compare their performances to select the optimal one to optimize. Secondly, the usefulness of all features is deeper investigated using a greedy feature selection algorithm. Thirdly, we propose a solution that exploits Boosting algorithm as a learning method in order to strengthen the contribution of dominant feature subsets to the system, thus improve of the system's prediction capability. Lastly, we explore the contributions of WCE in improving MT quality via some scenarios. In N-best list re-ranking, we synthesize scores from WCE outputs and integrate them with decoder scores to calculate again the objective function value, then to re-order the N-best list to choose a better candidate. In the decoder's search graph re-decoding, the proposition is to apply WCE score directly to the nodes containing each word to update its cost regarding on the word quality. Furthermore, WCE scores are used to build useful features, which can enhance the performance of the Sentence Confidence Estimation system. In total, our work brings the insightful and multidimensional picture of word quality prediction and its positive impact on various sectors for Machine Translation. The promising results open up a big avenue where WCE can play its role, such as WCE for Automatic Speech Recognition (ASR) System, WCE for multiple MT selection, and WCE for re-trainable and self-learning MT systems.

Computers

Statistical Machine Translation

Book Details:

Author : Philipp Koehn
Publisher : Cambridge University Press
Release : 2010
ISBN : 0521874157
Pages : 447 pages

Download or read book Statistical Machine Translation written by Philipp Koehn and published by Cambridge University Press. This book was released on 2010 with total page 447 pages. Available in PDF, EPUB and Kindle. Book excerpt: The dream of automatic language translation is now closer thanks to recent advances in the techniques that underpin statistical machine translation. This class-tested textbook from an active researcher in the field, provides a clear and careful introduction to the latest methods and explains how to build machine translation systems for any two languages. It introduces the subject's building blocks from linguistics and probability, then covers the major models for machine translation: word-based, phrase-based, and tree-based, as well as machine translation evaluation, language modeling, discriminative training and advanced methods to integrate linguistic annotation. The book also reports the latest research, presents the major outstanding challenges, and enables novices as well as experienced researchers to make novel contributions to this exciting area. Ideal for students at undergraduate and graduate level, or for anyone interested in the latest developments in machine translation.

Language Arts & Disciplines

Machine Learning in Translation

Book Details:

Author : Peng Wang
Publisher : Taylor & Francis
Release : 2023-04-12
ISBN : 100083865X
Pages : 219 pages

Download or read book Machine Learning in Translation written by Peng Wang and published by Taylor & Francis. This book was released on 2023-04-12 with total page 219 pages. Available in PDF, EPUB and Kindle. Book excerpt: Machine Learning in Translation introduces machine learning (ML) theories and technologies that are most relevant to translation processes, approaching the topic from a human perspective and emphasizing that ML and ML-driven technologies are tools for humans. Providing an exploration of the common ground between human and machine learning and of the nature of translation that leverages this new dimension, this book helps linguists, translators, and localizers better find their added value in a ML-driven translation environment. Part One explores how humans and machines approach the problem of translation in their own particular ways, in terms of word embeddings, chunking of larger meaning units, and prediction in translation based upon the broader context. Part Two introduces key tasks, including machine translation, translation quality assessment and quality estimation, and other Natural Language Processing (NLP) tasks in translation. Part Three focuses on the role of data in both human and machine learning processes. It proposes that a translator’s unique value lies in the capability to create, manage, and leverage language data in different ML tasks in the translation process. It outlines new knowledge and skills that need to be incorporated into traditional translation education in the machine learning era. The book concludes with a discussion of human-centered machine learning in translation, stressing the need to empower translators with ML knowledge, through communication with ML users, developers, and programmers, and with opportunities for continuous learning. This accessible guide is designed for current and future users of ML technologies in localization workflows, including students on courses in translation and localization, language technology, and related areas. It supports the professional development of translation practitioners, so that they can fully utilize ML technologies and design their own human-centered ML-driven translation workflows and NLP tasks.

Computers

Syntax based Statistical Machine Translation

Book Details:

Author : Philip Williams
Publisher : Morgan & Claypool Publishers
Release : 2016-08-01
ISBN : 1627055029
Pages : 211 pages

Download or read book Syntax based Statistical Machine Translation written by Philip Williams and published by Morgan & Claypool Publishers. This book was released on 2016-08-01 with total page 211 pages. Available in PDF, EPUB and Kindle. Book excerpt: This unique book provides a comprehensive introduction to the most popular syntax-based statistical machine translation models, filling a gap in the current literature for researchers and developers in human language technologies. While phrase-based models have previously dominated the field, syntax-based approaches have proved a popular alternative, as they elegantly solve many of the shortcomings of phrase-based models. The heart of this book is a detailed introduction to decoding for syntax-based models. The book begins with an overview of synchronous-context free grammar (SCFG) and synchronous tree-substitution grammar (STSG) along with their associated statistical models. It also describes how three popular instantiations (Hiero, SAMT, and GHKM) are learned from parallel corpora. It introduces and details hypergraphs and associated general algorithms, as well as algorithms for decoding with both tree and string input. Special attention is given to efficiency, including search approximations such as beam search and cube pruning, data structures, and parsing algorithms. The book consistently highlights the strengths (and limitations) of syntax-based approaches, including their ability to generalize phrase-based translation units, their modeling of specific linguistic phenomena, and their function of structuring the search space.

Cross-language information retrieval

Automatic Improvement of Machine Translation Systems

Book Details:

Author : Ariadna Font Llitjós
Publisher :
Release : 2007
ISBN :
Pages : 394 pages

Download or read book Automatic Improvement of Machine Translation Systems written by Ariadna Font Llitjós and published by . This book was released on 2007 with total page 394 pages. Available in PDF, EPUB and Kindle. Book excerpt: Abstract: "Achieving high translation quality remains the most daunting challenge Machine Translation (MT) systems currently face. Researchers have explored a variety of methods for including translator feedback in the MT loop. However, most MT systems have failed to incorporate post-editing efforts beyond the addition of corrected translations to the parallel training data for Example-Based and Statistical systems or to a translation memory database. This thesis describes a novel approach that utilizes post-editing information to automatically improve the underlying rules and lexical entries of a Transfer-Based MT system. This process can be divided into two main steps. First, an online translation correction tool allows for easy error diagnosis and implicit error categorization. Then, an Automatic Rule Refiner performs error remediation by tracing errors back to the problematic rules and lexical entries and executing repairs that are mostly lexical and morpho-syntactic in nature (such as word-order, missing constituents or incorrect agreement in transfer rules). This approach directly improves the intelligibility of corrected MT output and, more significantly, it generalizes over unseen data, providing improved MT output for similar sentences that have not been corrected. Experimental results on an English-Spanish MT system show that automatic rule refinements triggered by bilingual speaker corrections successfully translate unseen data that was incorrectly translated by the original, unrefined grammar. Improvements on translation quality over a baseline, as measured by standard automatic evaluation metrics, are statistically significant on a paired two-tailed t-test (p = 0.0051). One practical application of this research is extending and refining relatively small translation grammars for resource-poor languages, such as Mapudungun and Quechua, into a major language, such as English or Spanish. Initial experimental results on a Spanish-Mapudungun MT system show that rule refinement operations generalize well to a different language pair and are able to correct errors in the grammar and the lexicon."

Computers

Translation Quality Assessment

Book Details:

Author : Joss Moorkens
Publisher : Springer
Release : 2018-07-13
ISBN : 3319912410
Pages : 292 pages

Download or read book Translation Quality Assessment written by Joss Moorkens and published by Springer. This book was released on 2018-07-13 with total page 292 pages. Available in PDF, EPUB and Kindle. Book excerpt: This is the first volume that brings together research and practice from academic and industry settings and a combination of human and machine translation evaluation. Its comprehensive collection of papers by leading experts in human and machine translation quality and evaluation who situate current developments and chart future trends fills a clear gap in the literature. This is critical to the successful integration of translation technologies in the industry today, where the lines between human and machine are becoming increasingly blurred by technology: this affects the whole translation landscape, from students and trainers to project managers and professionals, including in-house and freelance translators, as well as, of course, translation scholars and researchers. The editors have broad experience in translation quality evaluation research, including investigations into professional practice with qualitative and quantitative studies, and the contributors are leading experts in their respective fields, providing a unique set of complementary perspectives on human and machine translation quality and evaluation, combining theoretical and applied approaches.

Technology & Engineering

Information Systems Design and Intelligent Applications

Book Details:

Author : J. K. Mandal
Publisher : Springer
Release : 2015-01-20
ISBN : 8132222474
Pages : 858 pages

Download or read book Information Systems Design and Intelligent Applications written by J. K. Mandal and published by Springer. This book was released on 2015-01-20 with total page 858 pages. Available in PDF, EPUB and Kindle. Book excerpt: The second international conference on INformation Systems Design and Intelligent Applications (INDIA – 2015) held in Kalyani, India during January 8-9, 2015. The book covers all aspects of information system design, computer science and technology, general sciences, and educational research. Upon a double blind review process, a number of high quality papers are selected and collected in the book, which is composed of two different volumes, and covers a variety of topics, including natural language processing, artificial intelligence, security and privacy, communications, wireless and sensor networks, microelectronics, circuit and systems, machine learning, soft computing, mobile computing and applications, cloud computing, software engineering, graphics and image processing, rural engineering, e-commerce, e-governance, business computing, molecular computing, nano computing, chemical computing, intelligent computing for GIS and remote sensing, bio-informatics and bio-computing. These fields are not only limited to computer researchers but also include mathematics, chemistry, biology, bio-chemistry, engineering, statistics, and all others in which computer techniques may assist.

Computers

Repairing Texts

Book Details:

Author : Hans P. Krings
Publisher : Kent State University Press
Release : 2001
ISBN : 9780873386715
Pages : 656 pages

Download or read book Repairing Texts written by Hans P. Krings and published by Kent State University Press. This book was released on 2001 with total page 656 pages. Available in PDF, EPUB and Kindle. Book excerpt: This study challenges the idea that, given the effectiveness of machine translation, major costs could be reduced by using monolingual staff to post-edit translations. It presents studies of machine translation systems, and current research into translation process.

Machine Translation Systems

Book Details:

Author : Jonathan Slocum
Publisher :
Release : 1988
ISBN :
Pages : 0 pages

Download or read book Machine Translation Systems written by Jonathan Slocum and published by . This book was released on 1988 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Directing Post Editors Attention To Machine Translation Output That Needs Editing Through An Enhanced User Interface

Book Details:

Author : Devin Robert Gilbert
Publisher :
Release : 2022
ISBN :
Pages : 0 pages

Download or read book Directing Post Editors Attention To Machine Translation Output That Needs Editing Through An Enhanced User Interface written by Devin Robert Gilbert and published by . This book was released on 2022 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Post-editing of machine translation (MT) is a workflow that is being used for an increasing number of text types and domains (Koponen, 2016; Hu, 2020; Zouhar et al., 2021),but the sections of text that post-editors need to fix have become harder to detect due to the increased human-like fluency that neural machine translation (NMT) affords (Comparin & Mendes, 2017; Yamada, 2019). This dissertation seeks to address this problem by developing a word-level machine translation quality estimation (MTQE) system to highlight words in raw MT output that need editing in order to aid post-editors. Subsequently, this MTQE system is tested in a large-scale post-editing experiment to determine if it increases productivity and decreases cognitive effort and error rate. This MTQE system is based on two automatically generated features: word translation entropy, generated from the output of multiple MT systems (a feature that has never been used in MTQE), and word class (based on part-of-speech tags). For the post-editing experiment, a within-subjects design assigns raw MT output to participants under three different conditions. Two experimental conditions consist of MT output that has been enhanced with highlighting surrounding the stretches of text that likely need to be edited. In the first experimental condition, this highlighting is supplied automatically by the MTQE system, and in the second experimental condition, this highlighting is supplied by an experienced translator, indicating what text needs editing. The control condition constitutes MT output without highlighting. Participants post-edit three experimental texts in Trados Studio while time-stamped keystroke logs are gathered (which are later integrated into the CRITT Translation Process Research Database (TPR-DB)), and various measures of temporal, technical, cognitive, perceived effort, and group editing activity are used to assess the efficacy and usefulness of highlighting potential errors in the post-editing user interface.

Computers

Neural Machine Translation

Book Details:

Author : Philipp Koehn
Publisher : Cambridge University Press
Release : 2020-06-18
ISBN : 1108497322
Pages : 409 pages

Download or read book Neural Machine Translation written by Philipp Koehn and published by Cambridge University Press. This book was released on 2020-06-18 with total page 409 pages. Available in PDF, EPUB and Kindle. Book excerpt: Learn how to build machine translation systems with deep learning from the ground up, from basic concepts to cutting-edge research.

Computers

Machine Translation

Book Details:

Author : Jinsong Su
Publisher : Springer Nature
Release : 2021-10-29
ISBN : 9811675120
Pages : 137 pages

Download or read book Machine Translation written by Jinsong Su and published by Springer Nature. This book was released on 2021-10-29 with total page 137 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book constitutes the refereed proceedings of the 17th China Conference on Machine Translation, CCMT 2020, held in Xining, China, in October 2021. The 10 papers presented in this volume were carefully reviewed and selected from 25 submissions and focus on all aspects of machine translation, including preprocessing, neural machine translation models, hybrid model, evaluation method, and post-editing.

Language Arts & Disciplines

A short guide to post editing

Book Details:

Author : Jean Nitzke
Publisher : Language Science Press
Release :
ISBN : 396110333X
Pages : 104 pages

Download or read book A short guide to post editing written by Jean Nitzke and published by Language Science Press. This book was released on with total page 104 pages. Available in PDF, EPUB and Kindle. Book excerpt: Artificial intelligence is changing and will continue to change the world we live in. These changes are also influencing the translation market. Machine translation (MT) systems automatically transfer one language to another within seconds. However, MT systems are very often still not capable of producing perfect translations. To achieve high quality translations, the MT output first has to be corrected by a professional translator. This procedure is called post-editing (PE). PE has become an established task on the professional translation market. The aim of this text book is to provide basic knowledge about the most relevant topics in professional PE. The text book comprises ten chapters on both theoretical and practical aspects including topics like MT approaches and development, guidelines, integration into CAT tools, risks in PE, data security, practical decisions in the PE process, competences for PE, and new job profiles.

Language Arts & Disciplines

Linguistically Motivated Statistical Machine Translation

Book Details:

Author : Deyi Xiong
Publisher : Springer
Release : 2015-02-11
ISBN : 9812873562
Pages : 159 pages

Download or read book Linguistically Motivated Statistical Machine Translation written by Deyi Xiong and published by Springer. This book was released on 2015-02-11 with total page 159 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book provides a wide variety of algorithms and models to integrate linguistic knowledge into Statistical Machine Translation (SMT). It helps advance conventional SMT to linguistically motivated SMT by enhancing the following three essential components: translation, reordering and bracketing models. It also serves the purpose of promoting the in-depth study of the impacts of linguistic knowledge on machine translation. Finally it provides a systematic introduction of Bracketing Transduction Grammar (BTG) based SMT, one of the state-of-the-art SMT formalisms, as well as a case study of linguistically motivated SMT on a BTG-based platform.