EBookClubs

Read Books & Download eBooks Full Online

EBookClubs

Read Books & Download eBooks Full Online

Book Exploiting Comparable Corpora for Domain specific Statistical Machine Translation

Download or read book Exploiting Comparable Corpora for Domain specific Statistical Machine Translation written by Magdalena Plamadă and published by . This book was released on 2018 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Using Comparable Corpora for Under Resourced Areas of Machine Translation

Download or read book Using Comparable Corpora for Under Resourced Areas of Machine Translation written by Inguna Skadiņa and published by Springer. This book was released on 2019-02-06 with total page 326 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book provides an overview of how comparable corpora can be used to overcome the lack of parallel resources when building machine translation systems for under-resourced languages and domains. It presents a wealth of methods and open tools for building comparable corpora from the Web, evaluating comparability and extracting parallel data that can be used for the machine translation task. It is divided into several sections, each covering a specific task such as building, processing, and using comparable corpora, focusing particularly on under-resourced language pairs and domains. The book is intended for anyone interested in data-driven machine translation for under-resourced languages and domains, especially for developers of machine translation systems, computational linguists and language workers. It offers a valuable resource for specialists and students in natural language processing, machine translation, corpus linguistics and computer-assisted translation, and promotes the broader use of comparable corpora in natural language processing and computational linguistics.

Book Exploration and Exploitation of Multilingual Data for Statistical Machine Translation

Download or read book Exploration and Exploitation of Multilingual Data for Statistical Machine Translation written by and published by . This book was released on 2012 with total page 179 pages. Available in PDF, EPUB and Kindle. Book excerpt: "Shortly after the birth of computer science, researchers realised the importance of machine translation as a task worth of concentrated effort, but it is only recently that algorithms are able to provide automatic translations usable by the masses. Modern translation systems are dependent on bilingual corpora, a modern Rosetta Stone, from which the learn cross-lingual relationships that can be used to translate sentences which are not in the training corpus. This data is crucial. If it is insufficient, or out-of-domain, then translation quality degrades. To improve quality, we need to both perfect methods that extract usable translation from additional multilingual resources, and improve the constituent models of a translation system to better exploit existing multilingual data sets. In this thesis, we focus on these dual problems. Our approach is two-fold, and the thesis is structures accordingly. In part I we study the problem of extracting translations from the web, with a focus on exploiting the growing predominance of microblog platforms. We present novel methods for the language identification of microblog posts, and conduct a thorough analysis of existing methods that explore these microblog posts for new translations. In part II we study the orthogonal problem of improving language models for the tasks of reranking and source side morphological analysis. We begin by analysing a plethora of syntactic features for reranking n-best lists output from an automatic translation system. We then present a novel algorithm that allows for exact inference from high-order hidden Markov models, which we use to segment source text input. In this way, the thesis gives insight into the retrieval of relevant training data, and introduces novel methods that better utilise existing multilingual corpora."--Omslag.

Book Intelligent Natural Language Processing  Trends and Applications

Download or read book Intelligent Natural Language Processing Trends and Applications written by Khaled Shaalan and published by Springer. This book was released on 2017-11-17 with total page 763 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book brings together scientists, researchers, practitioners, and students from academia and industry to present recent and ongoing research activities concerning the latest advances, techniques, and applications of natural language processing systems, and to promote the exchange of new ideas and lessons learned. Taken together, the chapters of this book provide a collection of high-quality research works that address broad challenges in both theoretical and applied aspects of intelligent natural language processing. The book presents the state-of-the-art in research on natural language processing, computational linguistics, applied Arabic linguistics and related areas. New trends in natural language processing systems are rapidly emerging – and finding application in various domains including education, travel and tourism, and healthcare, among others. Many issues encountered during the development of these applications can be resolved by incorporating language technology solutions. The topics covered by the book include: Character and Speech Recognition; Morphological, Syntactic, and Semantic Processing; Information Extraction; Information Retrieval and Question Answering; Text Classification and Text Mining; Text Summarization; Sentiment Analysis; Machine Translation Building and Evaluating Linguistic Resources; and Intelligent Language Tutoring Systems.

Book Machine Learning in Translation Corpora Processing

Download or read book Machine Learning in Translation Corpora Processing written by Krzysztof Wolk and published by CRC Press. This book was released on 2019-02-25 with total page 205 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book reviews ways to improve statistical machine speech translation between Polish and English. Research has been conducted mostly on dictionary-based, rule-based, and syntax-based, machine translation techniques. Most popular methodologies and tools are not well-suited for the Polish language and therefore require adaptation, and language resources are lacking in parallel and monolingual data. The main objective of this volume to develop an automatic and robust Polish-to-English translation system to meet specific translation requirements and to develop bilingual textual resources by mining comparable corpora.

Book Building and Using Comparable Corpora

Download or read book Building and Using Comparable Corpora written by Serge Sharoff and published by Springer Science & Business Media. This book was released on 2013-12-13 with total page 333 pages. Available in PDF, EPUB and Kindle. Book excerpt: The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field. The proposed volume provides a reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.

Book Statistical Machine Translation

Download or read book Statistical Machine Translation written by Philipp Koehn and published by Cambridge University Press. This book was released on 2010 with total page 447 pages. Available in PDF, EPUB and Kindle. Book excerpt: The dream of automatic language translation is now closer thanks to recent advances in the techniques that underpin statistical machine translation. This class-tested textbook from an active researcher in the field, provides a clear and careful introduction to the latest methods and explains how to build machine translation systems for any two languages. It introduces the subject's building blocks from linguistics and probability, then covers the major models for machine translation: word-based, phrase-based, and tree-based, as well as machine translation evaluation, language modeling, discriminative training and advanced methods to integrate linguistic annotation. The book also reports the latest research, presents the major outstanding challenges, and enables novices as well as experienced researchers to make novel contributions to this exciting area. Ideal for students at undergraduate and graduate level, or for anyone interested in the latest developments in machine translation.

Book Improving Statistical Machine Translation Using Comparable Corpora

Download or read book Improving Statistical Machine Translation Using Comparable Corpora written by Matthew Garvey Snover and published by . This book was released on 2010 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Annotation  exploitation and evaluation of parallel corpora  TC3 I

Download or read book Annotation exploitation and evaluation of parallel corpora TC3 I written by Silvia Hansen-Schirra and published by Language Science Press. This book was released on 2017-02-27 with total page 165 pages. Available in PDF, EPUB and Kindle. Book excerpt: Exchange between the translation studies and the computational linguistics communities has traditionally not been very intense. Among other things, this is reflected by the different views on parallel corpora. While computational linguistics does not always strictly pay attention to the translation direction (e.g. when translation rules are extracted from (sub)corpora which actually only consist of translations), translation studies are amongst other things concerned with exactly comparing source and target texts (e.g. to draw conclusions on interference and standardization effects). However, there has recently been more exchange between the two fields – especially when it comes to the annotation of parallel corpora. This special issue brings together the different research perspectives. Its contributions show – from both perspectives – how the communities have come to interact in recent years.

Book Neural Machine Translation

Download or read book Neural Machine Translation written by Philipp Koehn and published by Cambridge University Press. This book was released on 2020-06-18 with total page 409 pages. Available in PDF, EPUB and Kindle. Book excerpt: Learn how to build machine translation systems with deep learning from the ground up, from basic concepts to cutting-edge research.

Book Advances in Natural Language Processing

Download or read book Advances in Natural Language Processing written by Hitoshi Isahara and published by Springer. This book was released on 2012-10-22 with total page 343 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book constitutes the refereed proceedings of the 8th International Conference on Advances in Natural Language Processing, JapTAL 2012, Kanazawa, Japan, in October 2012. The 27 revised full papers and 5 revised short papers presented were carefully reviewed and selected from 42 submissions. The papers are organized in topical sections on machine translation, multilingual issues, resouces, semantic analysis, sentiment analysis, as well as speech and generation.

Book Comparable Corpora and Computer assisted Translation

Download or read book Comparable Corpora and Computer assisted Translation written by Estelle Maryline Delpech and published by John Wiley & Sons. This book was released on 2014-07-22 with total page 221 pages. Available in PDF, EPUB and Kindle. Book excerpt: Computer-assisted translation (CAT) has always used translation memories, which require the translator to have a corpus of previous translations that the CAT software can use to generate bilingual lexicons. This can be problematic when the translator does not have such a corpus, for instance, when the text belongs to an emerging field. To solve this issue, CAT research has looked into the leveraging of comparable corpora, i.e. a set of texts, in two or more languages, which deal with the same topic but are not translations of one another. This work had two primary objectives. The first is to assess the input of lexicons extracted from comparable corpora in the context of a specialized human translation task. The second objective is to identify bilingual-lexicon-extraction methods which best match the translators' needs, determining the current limits of these techniques and suggesting improvements. The author focuses, in particular, on the identification of fertile translations, the management of multiple morphological structures, and the ranking of candidate translations. The experiments are carried out on two language pairs (English–French and English–German) and on specialized texts dealing with breast cancer. This research puts significant emphasis on applicability – methodological choices are guided by the needs of the final users. This book is organized in two parts: the first part presents the applicative and scientific context of the research, and the second part is given over to efforts to improve compositional translation. The research work presented in this book received the PhD Thesis award 2014 from the French association for natural language processing (ATALA).

Book Machine Translation with Minimal Reliance on Parallel Resources

Download or read book Machine Translation with Minimal Reliance on Parallel Resources written by George Tambouratzis and published by Springer. This book was released on 2017-08-09 with total page 92 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book provides a unified view on a new methodology for Machine Translation (MT). This methodology extracts information from widely available resources (extensive monolingual corpora) while only assuming the existence of a very limited parallel corpus, thus having a unique starting point to Statistical Machine Translation (SMT). In this book, a detailed presentation of the methodology principles and system architecture is followed by a series of experiments, where the proposed system is compared to other MT systems using a set of established metrics including BLEU, NIST, Meteor and TER. Additionally, a free-to-use code is available, that allows the creation of new MT systems. The volume is addressed to both language professionals and researchers. Prerequisites for the readers are very limited and include a basic understanding of the machine translation as well as of the basic tools of natural language processing.​

Book Machine Learning in Translation Corpora Processing

Download or read book Machine Learning in Translation Corpora Processing written by Krzysztof Wolk and published by CRC Press. This book was released on 2019-02-25 with total page 264 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book reviews ways to improve statistical machine speech translation between Polish and English. Research has been conducted mostly on dictionary-based, rule-based, and syntax-based, machine translation techniques. Most popular methodologies and tools are not well-suited for the Polish language and therefore require adaptation, and language resources are lacking in parallel and monolingual data. The main objective of this volume to develop an automatic and robust Polish-to-English translation system to meet specific translation requirements and to develop bilingual textual resources by mining comparable corpora.

Book Empirical Methods for Exploiting Parallel Texts

Download or read book Empirical Methods for Exploiting Parallel Texts written by I. Dan Melamed and published by MIT Press. This book was released on 2001 with total page 224 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book lays out the theory and the practical techniques for discovering and applying translational equivalence at the lexical level. Parallel texts (bitexts) are a goldmine of linguistic knowledge, because the translation of a text into another language can be viewed as a detailed annotation of what that text means. Knowledge about translational equivalence, which can be gleaned from bitexts, is of central importance for applications such as manual and machine translation, cross-language information retrieval, and corpus linguistics. The availability of bitexts has increased dramatically since the advent of the Web, making their study an exciting new area of research in natural language processing. This book lays out the theory and the practical techniques for discovering and applying translational equivalence at the lexical level. It is a start-to-finish guide to designing and evaluating many translingual applications.

Book Computational Linguistics

Download or read book Computational Linguistics written by Kôiti Hasida and published by Springer. This book was released on 2018-03-05 with total page 361 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book constitutes the refereed proceedings of the 15th International Conference of the Pacific Association for Computational Linguistics, PACLING 2017, held in Yangon, Myanmar, in August 2017. The 28 revised full papers presented were carefully reviewed and selected from 50 submissions. The papers are organized in topical sections on semantics and semantic analysis; statistical machine translation; corpora and corpus-based language processing; syntax and syntactic analysis; document classification; information extraction and text mining; text summarization; text and message understanding; automatic speech recognition; spoken language and dialogue; speech pathology; speech analysis.

Book Comparable Corpora in Cross language Information Retrieval

Download or read book Comparable Corpora in Cross language Information Retrieval written by Tuomas Talvensaari and published by . This book was released on 2008 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Abstract: "Cross-language information retrieval (CLIR) enables users to express queries in a language different from the language of the documents to be retrieved. For example, a Finnish-speaking person could pose a query to a CLIR system in Finnish (the source language) to retrieve documents written in English (the target language). The language barrier is usually crossed by translating the query into the target language, after which the documents can be retrieved with the methods of monolingual information retrieval (IR). Aligned text collections (corpora) are common query translation resources in CLIR. A parallel corpus is a collection where texts in one language are aligned with their translations in another language. The aligned texts of a comparable corpus are more loosely related. They are not translations, but share topics and include common vocabulary in the two languages. Both kinds of corpora can be used to train statistical translation models, but parallel corpora are preferred because more dependable translation knowledge can be derived from them. However, parallel corpora do not exist for all language pairs and domains. Hence, it is sometimes necessary to resort to noisier comparable corpora. This thesis proposes new methods for the acquisition, alignment, and employment of comparable corpora. The acquisition method is based on language-aware focused web crawling, where web content written in specific languages and discussing specific topics of interest is obtained by employing the hyperlink structure of the web. In the alignment phase, the source language documents are used as CLIR queries to retrieve target language documents. The similarity of the query to the documents, and various other factors, are used as evidence to form alignments between the source and target language documents. The constructed corpora were employed in query translation as a cross-language similarity thesaurus, a structure where target language words are ranked based on their similarity with a source language word that is given as input. The highest ranking words are assumed to be either translations of the input word or related to it in some other manner. The methods were evaluated with extensive IR experiments that covered different language pairs, domains, and test data. The proposed CLIR approach was combined with approaches based on bilingual dictionaries. The combined approaches outperformed pure dictionary-based translation. In addition, the comparable corpus translation performed better in domain-specific CLIR than translation utilizing high-quality parallel corpora. This suggests that the proposed methods are particularly useful in domains where CLIR resources are scarce."