EBookClubs

Read Books & Download eBooks Full Online

EBookClubs

Read Books & Download eBooks Full Online

Book Data Selection for Statistical Machine Translation

Download or read book Data Selection for Statistical Machine Translation written by Mirela-Stefania Duma and published by . This book was released on 2021 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Data Selection for Statistical Machine Translation

Download or read book Data Selection for Statistical Machine Translation written by Amittai Axelrod and published by . This book was released on 2014 with total page 124 pages. Available in PDF, EPUB and Kindle. Book excerpt: Machine translation, the computerized translation of one human language to another, could be used to communicate between the thousands of languages used around the world. Statistical machine translation (SMT) is an approach to building these translation engines without much human intervention, and large-scale implementations by Google, Microsoft, and Facebook in their products are used by millions daily. The quality of SMT systems depends on the example translations used to train the models. Data can come from a variety of sources, many of which are not optimal for common specific tasks. The goal is to be able to find the right data to use to train a model for a particular task. This work determines the most relevant subsets of these large datasets with respect to a translation task, enabling the construction of task-specific translation systems that are more accurate and easier to train than the large-scale models. Three methods are explored for identifying task-relevant translation training data from a general data pool. The first uses only a language model to score the training data according to lexical probabilities, improving on prior results by using a bilingual score that accounts for differences between the target domain and the general data. The second is a topic-based relevance score that is novel for SMT, using topic models to project texts into a latent semantic space. These semantic vectors are then used to compute similarity of sentences in the general pool to the target task. This work finds that what the automatic topic models capture for some tasks is actually the style of the language, rather than task-specific content words. This motivates the third approach, a novel style-based data selection method. Hybrid word and part-of-speech (POS) representations of the two corpora are constructed by retaining the discriminative words and using POS tags as a proxy for the stylistic content of the infrequent words. Language models based on these representations can be used to quantify the underlying stylistic relevance between two texts. Experiments show that style-based data selection can outperform the current state-of-the-art method for task-specific data selection, in terms of SMT system performance and vocabulary coverage. Taken together, the experimental results indicate that it is important to characterize corpus differences when selecting data for statistical machine translation.

Book Data Selection Using Topic Adaptation for Statistical Machine Translation

Download or read book Data Selection Using Topic Adaptation for Statistical Machine Translation written by Hitokazu Matsushita and published by . This book was released on 2015 with total page 81 pages. Available in PDF, EPUB and Kindle. Book excerpt: Statistical machine translation (SMT) requires large quantities of bitexts (i.e., bilingual parallel corpora) as training data to yield good quality translations. While obtaining a large amount of training data is critical, the similarity between training and test data also has a significant impact on SMT performance. Many SMT studies define data similarity in terms of domain-overlap, and domains are defined to be synonymous with data sources. Consequently, the SMT community has focused on domain adaptation techniques that augment small (in-domain) datasets with large datasets from other sources (hence, out-of-domain, per the definition). However, many training datasets consist of topically diverse data, and not all data contained in a single dataset are useful for translations of a specific target task.

Book Linguistically Motivated Statistical Machine Translation

Download or read book Linguistically Motivated Statistical Machine Translation written by Deyi Xiong and published by Springer. This book was released on 2015-02-11 with total page 159 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book provides a wide variety of algorithms and models to integrate linguistic knowledge into Statistical Machine Translation (SMT). It helps advance conventional SMT to linguistically motivated SMT by enhancing the following three essential components: translation, reordering and bracketing models. It also serves the purpose of promoting the in-depth study of the impacts of linguistic knowledge on machine translation. Finally it provides a systematic introduction of Bracketing Transduction Grammar (BTG) based SMT, one of the state-of-the-art SMT formalisms, as well as a case study of linguistically motivated SMT on a BTG-based platform.

Book Statistical Machine Translation

Download or read book Statistical Machine Translation written by Philipp Koehn and published by Cambridge University Press. This book was released on 2010 with total page 447 pages. Available in PDF, EPUB and Kindle. Book excerpt: The dream of automatic language translation is now closer thanks to recent advances in the techniques that underpin statistical machine translation. This class-tested textbook from an active researcher in the field, provides a clear and careful introduction to the latest methods and explains how to build machine translation systems for any two languages. It introduces the subject's building blocks from linguistics and probability, then covers the major models for machine translation: word-based, phrase-based, and tree-based, as well as machine translation evaluation, language modeling, discriminative training and advanced methods to integrate linguistic annotation. The book also reports the latest research, presents the major outstanding challenges, and enables novices as well as experienced researchers to make novel contributions to this exciting area. Ideal for students at undergraduate and graduate level, or for anyone interested in the latest developments in machine translation.

Book Syntax based Statistical Machine Translation

Download or read book Syntax based Statistical Machine Translation written by Philip Williams and published by Springer Nature. This book was released on 2022-05-31 with total page 190 pages. Available in PDF, EPUB and Kindle. Book excerpt: This unique book provides a comprehensive introduction to the most popular syntax-based statistical machine translation models, filling a gap in the current literature for researchers and developers in human language technologies. While phrase-based models have previously dominated the field, syntax-based approaches have proved a popular alternative, as they elegantly solve many of the shortcomings of phrase-based models. The heart of this book is a detailed introduction to decoding for syntax-based models. The book begins with an overview of synchronous-context free grammar (SCFG) and synchronous tree-substitution grammar (STSG) along with their associated statistical models. It also describes how three popular instantiations (Hiero, SAMT, and GHKM) are learned from parallel corpora. It introduces and details hypergraphs and associated general algorithms, as well as algorithms for decoding with both tree and string input. Special attention is given to efficiency, including search approximations such as beam search and cube pruning, data structures, and parsing algorithms. The book consistently highlights the strengths (and limitations) of syntax-based approaches, including their ability to generalize phrase-based translation units, their modeling of specific linguistic phenomena, and their function of structuring the search space.

Book Machine Translation

    Book Details:
  • Author : Jinsong Su
  • Publisher : Springer Nature
  • Release : 2021-10-29
  • ISBN : 9811675120
  • Pages : 137 pages

Download or read book Machine Translation written by Jinsong Su and published by Springer Nature. This book was released on 2021-10-29 with total page 137 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book constitutes the refereed proceedings of the 17th China Conference on Machine Translation, CCMT 2020, held in Xining, China, in October 2021. The 10 papers presented in this volume were carefully reviewed and selected from 25 submissions and focus on all aspects of machine translation, including preprocessing, neural machine translation models, hybrid model, evaluation method, and post-editing.

Book Machine Translation

    Book Details:
  • Author : Muyun Yang
  • Publisher : Springer
  • Release : 2017-01-05
  • ISBN : 9811036357
  • Pages : 135 pages

Download or read book Machine Translation written by Muyun Yang and published by Springer. This book was released on 2017-01-05 with total page 135 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book constitutes the refereed proceedings of the 12th China Workshop on Machine Translation, CWMT 2016, held in Urumqi, China, in August 2016. The 10 English papers presented in this volume were carefully reviewed and selected from 76 submissions. They deal with statistical machine translation, hybrid machine translation, machine translation evaluation, post editing, alignment, and inducing bilingual knowledge from corpora.

Book Latent Domain Models for Statistical Machine Translation

Download or read book Latent Domain Models for Statistical Machine Translation written by Hoàng Cường and published by . This book was released on 2017 with total page 145 pages. Available in PDF, EPUB and Kindle. Book excerpt: "A data-driven approach to model translation suffers from the data mismatch problem and demands domain adaptation techniques. Given parallel training data originating from a specific domain, training an MT system on the data would result in a rather suboptimal translation for other domains. But does suboptimality of translation happen only in such an extreme scenario of domain mismatch? This dissertation shows that training SMT systems on heterogeneous corpora (e.g. EuroParl) may also result in suboptimal performance of statistical translation systems. Specifically, it is clear that a word/phrase could be translated in different ways when it comes to different domains. The translation statistics induced from word alignment models and phrase-based models, however, reflect translation preferences aggregated over diverse domains in heterogeneous corpora. In this sense, they can be considered as coarse and domain-confused statistics. This dissertation shows that domain-confused statistics may harm performance of both word alignment and phrase-based models. Another important contribution of this dissertation is to provide a principled way to address the problem. We focus on learning the translation statistics with respect to each of diverse domains (i.e. domain-focused translation statistics). With our method of domain induction for translation, we present a comprehensive study of domain adaptation for statistical machine translation, including four specific case studies Data Selection, Phrase-Based Translation, Word Alignment and Rewarding Domain Invariance in translation. Finally, we briefly describe Scorpio, the ILLC-UvA Adaptation System submitted to an adaptation task at WMT 2016, which participated with the language pair of English-Dutch. This system consolidates the ideas in the thesis on latent variable models for adaptation. Results validate the effective adaptation performance in a competitive setting."--Samenvatting auteur.

Book Machine Translation

    Book Details:
  • Author : Xiaodong Shi
  • Publisher : Springer
  • Release : 2014-10-29
  • ISBN : 3662457016
  • Pages : 127 pages

Download or read book Machine Translation written by Xiaodong Shi and published by Springer. This book was released on 2014-10-29 with total page 127 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book constitutes the refereed proceedings of the 10th China Workshop on Machine Translation, CWMT 2014, held in Macau, China, in November 2014. The 10 revised full English papers presented were carefully reviewed and selected from 15 submissions of English papers. The papers cover the following topics: machine translation; data selection; word segmentation; entity recognition; MT evaluation.

Book Learning Machine Translation

Download or read book Learning Machine Translation written by Cyril Goutte and published by MIT Press. This book was released on 2009 with total page 329 pages. Available in PDF, EPUB and Kindle. Book excerpt: How Machine Learning can improve machine translation: enabling technologies and new statistical techniques.

Book Neural Machine Translation

Download or read book Neural Machine Translation written by Philipp Koehn and published by Cambridge University Press. This book was released on 2020-06-18 with total page 409 pages. Available in PDF, EPUB and Kindle. Book excerpt: Learn how to build machine translation systems with deep learning from the ground up, from basic concepts to cutting-edge research.

Book Use of Source Language Context in Statistical MacHine Translation

Download or read book Use of Source Language Context in Statistical MacHine Translation written by Rejwanul Haque and published by LAP Lambert Academic Publishing. This book was released on 2012-02 with total page 228 pages. Available in PDF, EPUB and Kindle. Book excerpt: The translation features typically used in state-of-the-art statistical machine translation (SMT) model dependencies between the source and target phrases, but not among the phrases in the source language themselves. A swathe of research has demonstrated that integrating source context modelling directly into log-linear phrase-based SMT (PB-SMT) and hierarchical PB-SMT (HPB-SMT), and can positively influence the weighting and selection of target phrases, and thus improve translation quality. In this book we present novel approaches to incorporate source-language contextual modelling into the state-of-the-art SMT models in order to enhance the quality of lexical selection. We investigate the effectiveness of use of a range of contextual features, including lexical features of neighbouring words, part-of-speech tags, supertags, sentence-similarity features, dependency information, and semantic roles. We explored a series of language pairs featuring typologically different languages, and examined the scalability of our research to larger amounts of training data.

Book Web and Big Data

    Book Details:
  • Author : Yi Cai
  • Publisher : Springer
  • Release : 2018-07-18
  • ISBN : 3319968904
  • Pages : 495 pages

Download or read book Web and Big Data written by Yi Cai and published by Springer. This book was released on 2018-07-18 with total page 495 pages. Available in PDF, EPUB and Kindle. Book excerpt: This two-volume set, LNCS 10987 and 10988, constitutes the thoroughly refereed proceedings of the Second International Joint Conference, APWeb-WAIM 2018, held in Macau, China in July 2018. The 40 full papers presented together with 30 short papers, 6 demonstration papers and 3 keynotes were carefully reviewed and selected from 168 submissions. The papers are organized around the following topics: Text Analysis, Social Networks, Recommender Systems, Information Retrieval, Machine Learning, Knowledge Graphs, Database and Web Applications, Data Streams, Data Mining and Application, Query Processing, Big Data and Blockchain.

Book Hybrid Approaches to Machine Translation

Download or read book Hybrid Approaches to Machine Translation written by Marta R. Costa-jussà and published by Springer. This book was released on 2016-07-12 with total page 208 pages. Available in PDF, EPUB and Kindle. Book excerpt: This volume provides an overview of the field of Hybrid Machine Translation (MT) and presents some of the latest research conducted by linguists and practitioners from different multidisciplinary areas. Nowadays, most important developments in MT are achieved by combining data-driven and rule-based techniques. These combinations typically involve hybridization of different traditional paradigms, such as the introduction of linguistic knowledge into statistical approaches to MT, the incorporation of data-driven components into rule-based approaches, or statistical and rule-based pre- and post-processing for both types of MT architectures. The book is of interest primarily to MT specialists, but also – in the wider fields of Computational Linguistics, Machine Learning and Data Mining – to translators and managers of translation companies and departments who are interested in recent developments concerning automated translation tools.

Book Fortegnelse over malerier hidr  rende fra d  dsboet efter Maleren Emil Krause

Download or read book Fortegnelse over malerier hidr rende fra d dsboet efter Maleren Emil Krause written by and published by . This book was released on 1945 with total page 15 pages. Available in PDF, EPUB and Kindle. Book excerpt: