EBookClubs

Read Books & Download eBooks Full Online

EBookClubs

Read Books & Download eBooks Full Online

Book Improving Statistical Alignment and Translation Using Highly Multilingual Corpora

Download or read book Improving Statistical Alignment and Translation Using Highly Multilingual Corpora written by Camelia Ignat and published by . This book was released on 2009 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Improving Statistical Machine Translation Using Comparable Corpora

Download or read book Improving Statistical Machine Translation Using Comparable Corpora written by Matthew Garvey Snover and published by . This book was released on 2010 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Parallel Text Processing

Download or read book Parallel Text Processing written by Jean Véronis and published by Springer Science & Business Media. This book was released on 2013-03-14 with total page 417 pages. Available in PDF, EPUB and Kindle. Book excerpt: l This book evolved from the ARCADE evaluation exercise that started in 1995. The project's goal is to evaluate alignment systems for parallel texts, i. e. , texts accompanied by their translation. Thirteen teams from various places around the world have participated so far and for the first time, some ten to fifteen years after the first alignment techniques were designed, the community has been able to get a clear picture of the behaviour of alignment systems. Several chapters in this book describe the details of competing systems, and the last chapter is devoted to the description of the evaluation protocol and results. The remaining chapters were especially commissioned from researchers who have been major figures in the field in recent years, in an attempt to address a wide range of topics that describe the state of the art in parallel text processing and use. As I recalled in the introduction, the Rosetta stone won eternal fame as the prototype of parallel texts, but such texts are probably almost as old as the invention of writing. Nowadays, parallel texts are electronic, and they are be coming an increasingly important resource for building the natural language processing tools needed in the "multilingual information society" that is cur rently emerging at an incredible speed. Applications are numerous, and they are expanding every day: multilingual lexicography and terminology, machine and human translation, cross-language information retrieval, language learning, etc.

Book Neural Machine Translation

Download or read book Neural Machine Translation written by Philipp Koehn and published by Cambridge University Press. This book was released on 2020-06-18 with total page 409 pages. Available in PDF, EPUB and Kindle. Book excerpt: Learn how to build machine translation systems with deep learning from the ground up, from basic concepts to cutting-edge research.

Book Building and Using Comparable Corpora for Multilingual Natural Language Processing

Download or read book Building and Using Comparable Corpora for Multilingual Natural Language Processing written by Serge Sharoff and published by Springer Nature. This book was released on 2023-08-23 with total page 138 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book provides a comprehensive overview of methods to build comparable corpora and of their applications, including machine translation, cross-lingual transfer, and various kinds of multilingual natural language processing. The authors begin with a brief history on the topic followed by a comparison to parallel resources and an explanation of why comparable corpora have become more widely used. In particular, they provide the basis for the multilingual capabilities of pre-trained models, such as BERT or GPT. The book then focuses on building comparable corpora, aligning their sentences to create a database of suitable translations, and using these sentence translations to produce dictionaries and term banks. Then, it is explained how comparable corpora can be used to build machine translation engines and to develop a wide variety of multilingual applications.

Book Improving Statistical Machine Translation Through Adaptation and Learning

Download or read book Improving Statistical Machine Translation Through Adaptation and Learning written by Carlos A. Henriquez Q. and published by . This book was released on 2014 with total page 109 pages. Available in PDF, EPUB and Kindle. Book excerpt: With the arrival of free on-line machine translation (MT) systems, came the possibility to improve automatic translations with the help of daily users. One of the methods to achieve such improvements is to ask to users themselves for a better translation. It is possible that the system had made a mistake and if the user is able to detect it, it would be a valuable help to let the user teach the system where it made the mistake so it does not make it again if it finds a similar situation. Most of the translation systems you can find on-line provide a text area for users to suggest a better translation (like Google translator) or a ranking system for them to use (like Microsoft's). In 2009, as part of the Seventh Framework Programme of the European Commission, the FAUST project started with the goal of developing "machine translation (MT) systems which respond rapidly and intelligently to user feedback". Specifically, one of the project objective was to "develop mechanisms for instantaneously incorporating user feedback into the MT engines that are used in production environments, ...". As a member of the FAUST project, this thesis focused on developing one such mechanism. Formally, the general objective of this work was to design and implement a strategy to improve the translation quality of an already trained Statistical Machine Translation (SMT) system, using translations of input sentences that are corrections of the system's attempt to translate them. To address this problem we divided it in three specific objectives: 1. Define a relation between the words of a correction sentence and the words in the system's translation, in order to detect the errors that the former is aiming to solve. 2. Include the error corrections in the original system, so it learns how to solve them in case a similar situation occurs. 3. Test the strategy in different scenarios and with different data, in order to validate the applications of the proposed methodology. The main contributions made to the SMT field that can be found in this Ph.D. thesis are: - We defined a similarity function that compares an MT system output with a translation reference for that output and align the errors made by the system with the correct translations found in the reference. This information is then used to compute an alignment between the original input sentence and the reference. - We defined a method to perform domain adaptation based on the alignment mentioned before. Using this alignment with an in-domain parallel corpus, we extract new translation units that correspond both to units found in the system and were correctly chosen during translation and new units that include the correct translations found in the reference. These new units are then scored and combined with the units in the original system in order to improve its quality in terms of both human an automatic metrics. - We succesfully applied the method in a new task: to improve a SMT translation quality using post-editions provided by real users of the system. In this case, the alignment was computed over a parallel corpus build with post-editions, extracting translation units that correspond both to units found in the system and were correctly chosen during translation and new units that include the corrections found in the feedback provided. - The method proposed in this dissertation is able to achieve significant improvements in translation quality with a small learning material, corresponding to a 0.5% of the training material used to build the original system. Results from our evaluations also indicate that the improvement achieved with the domain adaptation strategy is measurable by both automatic a human-based evaluation metrics.

Book Using Comparable Corpora for Under Resourced Areas of Machine Translation

Download or read book Using Comparable Corpora for Under Resourced Areas of Machine Translation written by Inguna Skadiņa and published by Springer. This book was released on 2019-02-06 with total page 326 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book provides an overview of how comparable corpora can be used to overcome the lack of parallel resources when building machine translation systems for under-resourced languages and domains. It presents a wealth of methods and open tools for building comparable corpora from the Web, evaluating comparability and extracting parallel data that can be used for the machine translation task. It is divided into several sections, each covering a specific task such as building, processing, and using comparable corpora, focusing particularly on under-resourced language pairs and domains. The book is intended for anyone interested in data-driven machine translation for under-resourced languages and domains, especially for developers of machine translation systems, computational linguists and language workers. It offers a valuable resource for specialists and students in natural language processing, machine translation, corpus linguistics and computer-assisted translation, and promotes the broader use of comparable corpora in natural language processing and computational linguistics.

Book Machine Learning in Translation Corpora Processing

Download or read book Machine Learning in Translation Corpora Processing written by Krzysztof Wolk and published by CRC Press. This book was released on 2019-02-25 with total page 205 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book reviews ways to improve statistical machine speech translation between Polish and English. Research has been conducted mostly on dictionary-based, rule-based, and syntax-based, machine translation techniques. Most popular methodologies and tools are not well-suited for the Polish language and therefore require adaptation, and language resources are lacking in parallel and monolingual data. The main objective of this volume to develop an automatic and robust Polish-to-English translation system to meet specific translation requirements and to develop bilingual textual resources by mining comparable corpora.

Book Phrase Alignment Models for Statistical Machine Translation

Download or read book Phrase Alignment Models for Statistical Machine Translation written by John Sturdy DeNero and published by . This book was released on 2010 with total page 210 pages. Available in PDF, EPUB and Kindle. Book excerpt: The goal of a machine translation (MT) system is to automatically translate a document written in some human input language (e.g., Mandarin Chinese) into an equivalent document written in an output language (e.g., English). This task--so simple in its specification, and yet so rich in its complexities--has challenged computer science researchers for 60 years. While MT systems are in wide use today, the problem of producing human-quality translations remains unsolved. Statistical approaches have substantially improved the quality of MT systems by effectively exploiting parallel corpora: large collections of documents that have been translated by people, and therefore naturally occur in both the input and output languages. Broadly characterized, statistical MT systems translate an input document by matching fragments of its contents to examples in a parallel corpus, and then stitching together the translations of those fragments into a coherent document in an output language. The central challenge of this approach is to distill example translations into reusable parts: fragments of sentences that we know how to translate robustly and are likely to recur. Individual words are certainly common enough to recur, but they often cannot be translated correctly in isolation. At the other extreme, whole sentences can be translated without much context, but rarely repeat, and so cannot be recycled to build new translations. This thesis focuses on acquiring translations of phrases: contiguous sequences of a few words that encapsulate enough context to be translatable, but recur frequently in large corpora. We automatically identify phrase-level translations that are contained within human-translated sentences by partitioning each sentence into phrases and aligning phrases across languages. This alignment-based approach to acquiring phrasal translations gives rise to statistical models of phrase alignment. A statistical phrase alignment model assigns a score to each possible analysis of a sentence-level translation, where an analysis describes which phrases within that sentence can be translated and how to translate them. If the model assigns a high score to a particular phrasal translation, we should be willing to reuse that translation in new sentences that contain the same phrase. Chapter 1 provides a non-technical introduction to phrase alignment models and machine translation. Chapter 2 describes a complete state-of-the-art phrase-based translation system to clarify the role of phrase alignment models. The remainder of this thesis presents a series of novel models, analyses, and experimental results that together constitute a thorough investigation of phrase alignment models for statistical machine translation. Chapter 3 presents the formal properties of the class of phrase alignment models, including inference algorithms and tractability results. We present two specific models, along with statistical learning techniques to fit their parameters to data. Our experimental evaluation identifies two primary challenges to training and employing phrase alignment models, and we address each of these in turn. The first broad challenge is that generative phrase models are structured to prefer very long, rare phrases. These models require external pressure to explain observed translations using small, reusable phrases rather than large, unique ones. Chapter 4 describes three Bayesian models and a corresponding Gibbs sampler to address this challenge. These models outperform the word-level models that are widely employed in research and production MT systems. The second broad challenge is structural: there are many consistent and coherent ways of analyzing a translated sentence using phrases. Long phrases, short phrases, and overlapping phrases can all simultaneously express correct, translatable units. However, no previous phrase alignment models have leveraged this rich structure to predict alignments. We describe a discriminative model of multi-scale, overlapping phrases that outperforms all previously proposed models. The cumulative result of this thesis is to establish model-based phrase alignment as the most effective approach to acquiring phrasal translations. Only phrase alignment models are able to incorporate statistical signals about multi-word constructions into alignment decisions and score coherent phrasal analyses of full sentence pairs. As a result, phrase alignment models outperform classical word-level models in both generative and discriminative settings. This result is fundamental to the field: the models proposed in this thesis address a general, language-independent alignment problem that arises in all state-of-the-art statistical machine translation systems in use today.

Book Statistical Machine Translation

Download or read book Statistical Machine Translation written by Philipp Koehn and published by Cambridge University Press. This book was released on 2010 with total page 447 pages. Available in PDF, EPUB and Kindle. Book excerpt: The dream of automatic language translation is now closer thanks to recent advances in the techniques that underpin statistical machine translation. This class-tested textbook from an active researcher in the field, provides a clear and careful introduction to the latest methods and explains how to build machine translation systems for any two languages. It introduces the subject's building blocks from linguistics and probability, then covers the major models for machine translation: word-based, phrase-based, and tree-based, as well as machine translation evaluation, language modeling, discriminative training and advanced methods to integrate linguistic annotation. The book also reports the latest research, presents the major outstanding challenges, and enables novices as well as experienced researchers to make novel contributions to this exciting area. Ideal for students at undergraduate and graduate level, or for anyone interested in the latest developments in machine translation.

Book Exploration and Exploitation of Multilingual Data for Statistical Machine Translation

Download or read book Exploration and Exploitation of Multilingual Data for Statistical Machine Translation written by and published by . This book was released on 2012 with total page 179 pages. Available in PDF, EPUB and Kindle. Book excerpt: "Shortly after the birth of computer science, researchers realised the importance of machine translation as a task worth of concentrated effort, but it is only recently that algorithms are able to provide automatic translations usable by the masses. Modern translation systems are dependent on bilingual corpora, a modern Rosetta Stone, from which the learn cross-lingual relationships that can be used to translate sentences which are not in the training corpus. This data is crucial. If it is insufficient, or out-of-domain, then translation quality degrades. To improve quality, we need to both perfect methods that extract usable translation from additional multilingual resources, and improve the constituent models of a translation system to better exploit existing multilingual data sets. In this thesis, we focus on these dual problems. Our approach is two-fold, and the thesis is structures accordingly. In part I we study the problem of extracting translations from the web, with a focus on exploiting the growing predominance of microblog platforms. We present novel methods for the language identification of microblog posts, and conduct a thorough analysis of existing methods that explore these microblog posts for new translations. In part II we study the orthogonal problem of improving language models for the tasks of reranking and source side morphological analysis. We begin by analysing a plethora of syntactic features for reranking n-best lists output from an automatic translation system. We then present a novel algorithm that allows for exact inference from high-order hidden Markov models, which we use to segment source text input. In this way, the thesis gives insight into the retrieval of relevant training data, and introduces novel methods that better utilise existing multilingual corpora."--Omslag.

Book Bitext Alignment

    Book Details:
  • Author : Jörg Tiedemann
  • Publisher : Morgan & Claypool Publishers
  • Release : 2011
  • ISBN : 1608455106
  • Pages : 168 pages

Download or read book Bitext Alignment written by Jörg Tiedemann and published by Morgan & Claypool Publishers. This book was released on 2011 with total page 168 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book provides an overview of various techniques for the alignment of bitexts. It describes general concepts and strategies that can be applied to map corresponding parts in parallel documents on various levels of granularity. Bitexts are valuable linguistic resources for many different research fields and practical applications. The most predominant application is machine translation, in particular, statistical machine translation. However, there are various other threads that can be followed which may be supported by the rich linguistic knowledge implicitly stored in parallel resources. Bitexts have been explored in lexicography, word sense disambiguation, terminology extraction, computer-aided language learning and translation studies to name just a few. The book covers the essential tasks that have to be carried out when building parallel corpora starting from the collection of translated documents up to sub-sentential alignments. In particular, it describes various approaches to document alignment, sentence alignment, word alignment and tree structure alignment. It also includes a list of resources and a comprehensive review of the literature on alignment techniques. Table of Contents: Introduction / Basic Concepts and Terminology / Building Parallel Corpora / Sentence Alignment / Word Alignment / Phrase and Tree Alignment / Concluding Remarks

Book Human Language Technology Challenges for Computer Science and Linguistics

Download or read book Human Language Technology Challenges for Computer Science and Linguistics written by Zygmunt Vetulani and published by Springer. This book was released on 2014-07-25 with total page 552 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book constitutes the refereed proceedings of the 5th Language and Technology Conference: Challenges for Computer Science and Linguistics, LTC 2011, held in Poznan, Poland, in November 2011. The 44 revised and in many cases substantially extended papers presented in this volume were carefully reviewed and selected from 111 submissions. The focus of the papers is on the following topics: speech, parsing, computational semantics, text analysis, text annotation, language resources: general issues, language resources: ontologies and Wordnets and machine translation.

Book Corpora in Translation and Contrastive Research in the Digital Age

Download or read book Corpora in Translation and Contrastive Research in the Digital Age written by Julia Lavid-López and published by John Benjamins Publishing Company. This book was released on 2021-12-15 with total page 353 pages. Available in PDF, EPUB and Kindle. Book excerpt: Corpus-based contrastive and translation research are areas that keep evolving in the digital age, as the range of new corpus resources and tools expands, opening up to different approaches and application contexts. The current book contains a selection of papers which focus on corpora and translation research in the digital age, outlining some recent advances and explorations. After an introductory chapter which outlines language technologies applied to translation and interpreting with a view to identifying challenges and research opportunities, the first part of the book is devoted to current advances in the creation of new parallel corpora for under-researched areas, the development of tools to manage parallel corpora or as an alternative to parallel corpora, and new methodologies to improve existing translation memory systems. The contributions in the second part of the book address a number of cutting-edge linguistic issues in the area of contrastive discourse studies and translation analysis on the basis of comparable and parallel corpora in several languages such as English, German, Swedish, French, Italian, Spanish, Portuguese and Turkish, thus showcasing the richness of the linguistic diversity carried out in these recent investigations. Given the multiplicity of topics, methodologies and languages studied in the different chapters, the book will be of interest to a wide audience working in the fields of translation studies, contrastive linguistics and the automatic processing of language.

Book On Word Alignment Models for Statistical Machine Translation

Download or read book On Word Alignment Models for Statistical Machine Translation written by Shaojun Zhao and published by . This book was released on 2011 with total page 240 pages. Available in PDF, EPUB and Kindle. Book excerpt: "Machine translation remains the holy grail of computational linguistics. All statistical machine translation systems are built upon the idea of word alignment. While the field of word alignment has had tremendous progress in the last two decades, it is still in great need of speed and quality improvement. We designed a fertility hidden Markov model for word alignment, which is dramatically faster than the most widely used IBM Model 4. In fact, our model is even faster and has lower alignment error rate (AER) than the hidden Markov model. An experiment on Chinese-English translation shows that our word alignment model leads to better translation results than IBM Model 4, based on the BLEU metric. We also designed algorithms that mine massive and high quality bilingual texts for a variety of language pairs from the web using word alignment. The resulting data improved a state-ofthe- art machine translation system."--Leaf v.

Book Discriminative Alignment Models For Statistical Machine Translation

Download or read book Discriminative Alignment Models For Statistical Machine Translation written by Nadi Tomeh and published by . This book was released on 2012 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Bitext alignment is the task of aligning a text in a source language and its translation in the target language. Aligning amounts to finding the translational correspondences between textual units at different levels of granularity. Many practical natural language processing applications rely on bitext alignments to access the rich linguistic knowledge present in a bitext. While the most predominant application for bitexts is statistical machine translation, they are also used in multilingual (and monolingual) lexicography, word sense disambiguation, terminology extraction, computer-aided language learning andtranslation studies, to name a few.Bitext alignment is an arduous task because meaning is not expressed seemingly across languages. It varies along linguistic properties and cultural backgrounds of different languages, and also depends on the translation strategy that have been used to produce the bitext.Current practices in bitext alignment model the alignment as a hidden variable in the translation process. In order to reduce the complexity of the task, such approaches suppose that a word in the source sentence is aligned to one word at most in the target sentence.However, this over-simplistic assumption results in asymmetric, one-to-many alignments, whereas alignments are typically symmetric and many-to-many.To achieve symmetry, two one-to-many alignments in opposite translation directions are built and combined using a heuristic.In order to use these word alignments in phrase-based translation systems which use phrases instead of words, a heuristic is used to extract phrase pairs that are consistent with the word alignment.In this dissertation we address both the problems of word alignment and phrase pairs extraction.We improve the state of the art in several ways using discriminative learning techniques.We present a maximum entropy (MaxEnt) framework for word alignment.In this framework, links are predicted independently from one another using a MaxEnt classifier.The interaction between alignment decisions is approximated using stackingtechniques, which allows us to account for a part of the structural dependencies without increasing the complexity. This formulation can be seen as an alignment combination method,in which the union of several input alignments is used to guide the output alignment. Additionally, input alignments are used to compute a rich set of feature functions.Our MaxEnt aligner obtains state of the art results in terms of alignment quality as measured by thealignment error rate, and translation quality as measured by BLEU on large-scale Arabic-English NIST'09 systems.We also present a translation quality informed procedure for both extraction and evaluation of phrase pairs. We reformulate the problem in the supervised framework in which we decide for each phrase pair whether we keep it or not in the translation model. This offers a principled way to combine several features to make the procedure more robust to alignment difficulties. We use a simple and effective method, based on oracle decoding,to annotate phrase pairs that are useful for translation. Using machine learning techniques based on positive examples only,these annotations can be used to learn phrase alignment decisions. Using this approach we obtain improvements in BLEU scores for recall-oriented translation models, which are suitable for small training corpora.

Book Corpus Linguistics for Translation and Contrastive Studies

Download or read book Corpus Linguistics for Translation and Contrastive Studies written by Mikhail Mikhailov and published by Routledge. This book was released on 2016-05-20 with total page 292 pages. Available in PDF, EPUB and Kindle. Book excerpt: Corpus Linguistics for Translation and Contrastive Studies provides a clear and practical introduction to using corpora in these fields. Giving special attention to parallel corpora, which are collections of texts in two or more languages, and demonstrating the potential benefits for multilingual corpus linguistics research to both translators and researchers, this book: explores the different types of parallel corpora available, and shows how to use basic and advanced search procedures to analyse them; explains how to compile a parallel corpus, and discusses their uses for translation purposes and to research linguistic phenomena across languages; demonstrates the use of corpus extracts across a wide range of texts, including dictionaries, novels by authors including Jane Austen and Mikhail Bulgakov, and newspapers such as The Sunday Times; is illustrated with case studies from a range of languages including Finnish, Russian, English and French. Written by two experienced researchers and practitioners, Corpus Linguistics for Translation and Contrastive Studies is essential reading for postgraduate students and researchers working within the area of translation and contrastive studies.