[EBOOK] Comparable Corpora In Cross Language Information Retrieval PDF Download

Cross-language information retrieval

Comparable Corpora in Cross language Information Retrieval

Book Details:

Author : Tuomas Talvensaari
Publisher :
Release : 2008
ISBN :
Pages : pages

Download or read book Comparable Corpora in Cross language Information Retrieval written by Tuomas Talvensaari and published by . This book was released on 2008 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Abstract: "Cross-language information retrieval (CLIR) enables users to express queries in a language different from the language of the documents to be retrieved. For example, a Finnish-speaking person could pose a query to a CLIR system in Finnish (the source language) to retrieve documents written in English (the target language). The language barrier is usually crossed by translating the query into the target language, after which the documents can be retrieved with the methods of monolingual information retrieval (IR). Aligned text collections (corpora) are common query translation resources in CLIR. A parallel corpus is a collection where texts in one language are aligned with their translations in another language. The aligned texts of a comparable corpus are more loosely related. They are not translations, but share topics and include common vocabulary in the two languages. Both kinds of corpora can be used to train statistical translation models, but parallel corpora are preferred because more dependable translation knowledge can be derived from them. However, parallel corpora do not exist for all language pairs and domains. Hence, it is sometimes necessary to resort to noisier comparable corpora. This thesis proposes new methods for the acquisition, alignment, and employment of comparable corpora. The acquisition method is based on language-aware focused web crawling, where web content written in specific languages and discussing specific topics of interest is obtained by employing the hyperlink structure of the web. In the alignment phase, the source language documents are used as CLIR queries to retrieve target language documents. The similarity of the query to the documents, and various other factors, are used as evidence to form alignments between the source and target language documents. The constructed corpora were employed in query translation as a cross-language similarity thesaurus, a structure where target language words are ranked based on their similarity with a source language word that is given as input. The highest ranking words are assumed to be either translations of the input word or related to it in some other manner. The methods were evaluated with extensive IR experiments that covered different language pairs, domains, and test data. The proposed CLIR approach was combined with approaches based on bilingual dictionaries. The combined approaches outperformed pure dictionary-based translation. In addition, the comparable corpus translation performed better in domain-specific CLIR than translation utilizing high-quality parallel corpora. This suggests that the proposed methods are particularly useful in domains where CLIR resources are scarce."

Computers

Cross Language Information Retrieval

Book Details:

Author : Jian-Yun Nie
Publisher : Springer Nature
Release : 2022-05-31
ISBN : 303102138X
Pages : 125 pages

Download or read book Cross Language Information Retrieval written by Jian-Yun Nie and published by Springer Nature. This book was released on 2022-05-31 with total page 125 pages. Available in PDF, EPUB and Kindle. Book excerpt: Search for information is no longer exclusively limited within the native language of the user, but is more and more extended to other languages. This gives rise to the problem of cross-language information retrieval (CLIR), whose goal is to find relevant information written in a different language to a query. In addition to the problems of monolingual information retrieval (IR), translation is the key problem in CLIR: one should translate either the query or the documents from a language to another. However, this translation problem is not identical to full-text machine translation (MT): the goal is not to produce a human-readable translation, but a translation suitable for finding relevant documents. Specific translation methods are thus required. The goal of this book is to provide a comprehensive description of the specific problems arising in CLIR, the solutions proposed in this area, as well as the remaining problems. The book starts with a general description of the monolingual IR and CLIR problems. Different classes of approaches to translation are then presented: approaches using an MT system, dictionary-based translation and approaches based on parallel and comparable corpora. In addition, the typical retrieval effectiveness using different approaches is compared. It will be shown that translation approaches specifically designed for CLIR can rival and outperform high-quality MT systems. Finally, the book offers a look into the future that draws a strong parallel between query expansion in monolingual IR and query translation in CLIR, suggesting that many approaches developed in monolingual IR can be adapted to CLIR. The book can be used as an introduction to CLIR. Advanced readers can also find more technical details and discussions about the remaining research challenges in the future. It is suitable to new researchers who intend to carry out research on CLIR. Table of Contents: Preface / Introduction / Using Manually Constructed Translation Systems and Resources for CLIR / Translation Based on Parallel and Comparable Corpora / Other Methods to Improve CLIR / A Look into the Future: Toward a Unified View of Monolingual IR and CLIR? / References / Author Biography

Electronic books

Evaluating Information Retrieval and Access Tasks

Book Details:

Author : Tetsuya Sakai
Publisher : Springer Nature
Release : 1901
ISBN : 9811555540
Pages : 225 pages

Download or read book Evaluating Information Retrieval and Access Tasks written by Tetsuya Sakai and published by Springer Nature. This book was released on 1901 with total page 225 pages. Available in PDF, EPUB and Kindle. Book excerpt: This open access book summarizes the first two decades of the NII Testbeds and Community for Information access Research (NTCIR). NTCIR is a series of evaluation forums run by a global team of researchers and hosted by the National Institute of Informatics (NII), Japan. The book is unique in that it discusses not just what was done at NTCIR, but also how it was done and the impact it has achieved. For example, in some chapters the reader sees the early seeds of what eventually grew to be the search engines that provide access to content on the World Wide Web, todays smartphones that can tailor what they show to the needs of their owners, and the smart speakers that enrich our lives at home and on the move. We also get glimpses into how new search engines can be built for mathematical formulae, or for the digital record of a lived human life. Key to the success of the NTCIR endeavor was early recognition that information access research is an empirical discipline and that evaluation therefore lay at the core of the enterprise. Evaluation is thus at the heart of each chapter in this book. They show, for example, how the recognition that some documents are more important than others has shaped thinking about evaluation design. The thirty-three contributors to this volume speak for the many hundreds of researchers from dozens of countries around the world who together shaped NTCIR as organizers and participants. This book is suitable for researchers, practitioners, and students--anyone who wants to learn about past and present evaluation efforts in information retrieval, information access, and natural language processing, as well as those who want to participate in an evaluation task or even to design and organize one.

Computers

Building and Using Comparable Corpora

Book Details:

Author : Serge Sharoff
Publisher : Springer Science & Business Media
Release : 2013-12-13
ISBN : 3642201288
Pages : 333 pages

Download or read book Building and Using Comparable Corpora written by Serge Sharoff and published by Springer Science & Business Media. This book was released on 2013-12-13 with total page 333 pages. Available in PDF, EPUB and Kindle. Book excerpt: The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field. The proposed volume provides a reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.

Computers

Cross Language Information Retrieval

Book Details:

Author : Gregory Grefenstette
Publisher : Springer Science & Business Media
Release : 2012-12-06
ISBN : 1461556619
Pages : 190 pages

Download or read book Cross Language Information Retrieval written by Gregory Grefenstette and published by Springer Science & Business Media. This book was released on 2012-12-06 with total page 190 pages. Available in PDF, EPUB and Kindle. Book excerpt: Most of the papers in this volume were first presented at the Workshop on Cross-Linguistic Information Retrieval that was held August 22, 1996 dur ing the SIGIR'96 Conference. Alan Smeaton of Dublin University and Paraic Sheridan of the ETH, Zurich, were the two other members of the Scientific Committee for this workshop. SIGIR is the Association for Computing Ma chinery (ACM) Special Interest Group on Information Retrieval, and they have held conferences yearly since 1977. Three additional papers have been added: Chapter 4 Distributed Cross-Lingual Information retrieval describes the EMIR retrieval system, one of the first general cross-language systems to be implemented and evaluated; Chapter 6 Mapping Vocabularies Using Latent Semantic Indexing, which originally appeared as a technical report in the Lab oratory for Computational Linguistics at Carnegie Mellon University in 1991, is included here because it was one of the earliest, though hard-to-find, publi cations showing the application of Latent Semantic Indexing to the problem of cross-language retrieval; and Chapter 10 A Weighted Boolean Model for Cross Language Text Retrieval describes a recent approach to solving the translation term weighting problem, specific to Cross-Language Information Retrieval. Gregory Grefenstette CONTRIBUTORS Lisa Ballesteros David Hull W, Bruce Croft Gregory Grefenstette Center for Intelligent Xerox Research Centre Europe Information Retrieval Grenoble Laboratory Computer Science Department University of Massachusetts Thomas K. Landauer Department of Psychology Mark W. Davis and Institute of Cognitive Science Computing Research Lab University of Colorado, Boulder New Mexico State University Michael L. Littman Bonnie J.

Computers

Building and Using Comparable Corpora for Multilingual Natural Language Processing

Book Details:

Author : Serge Sharoff
Publisher : Springer
Release : 2023-07-01
ISBN : 9783031313837
Pages : 0 pages

Download or read book Building and Using Comparable Corpora for Multilingual Natural Language Processing written by Serge Sharoff and published by Springer. This book was released on 2023-07-01 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book provides a comprehensive overview of methods to build comparable corpora and of their applications, including machine translation, cross-lingual transfer, and various kinds of multilingual natural language processing. The authors begin with a brief history on the topic followed by a comparison to parallel resources and an explanation of why comparable corpora have become more widely used. The book then focuses on building comparable corpora, aligning their sentences to create a database of suitable translations, and using these sentence translations to produce dictionaries and term banks. Then, it is explained how comparable corpora can be used to build machine translation engines and to develop a wide variety of multilingual applications.

Language Arts & Disciplines

Advances in Cross Language Information Retrieval

Book Details:

Author : Martin Braschler
Publisher : Springer
Release : 2003-11-17
ISBN : 3540452370
Pages : 835 pages

Download or read book Advances in Cross Language Information Retrieval written by Martin Braschler and published by Springer. This book was released on 2003-11-17 with total page 835 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book presents the thoroughly refereed post-proceedings of a workshop by the Cross-Language Evaluation Forum Campaign, CLEF 2002, held in Rome, Italy in September 2002. The 43 revised full papers presented together with an introduction and run data in an appendix were carefully reviewed and revised upon presentation at the workshop. The papers are organized in topical sections on systems evaluation experiments, cross language and more, monolingual experiments, mainly domain-specific information retrieval, interactive issues, cross-language spoken document retrieval, and cross-language evaluation issues and initiatives.

Computers

Cross Language Information Retrieval and Evaluation

Book Details:

Author : Cross-Language Evaluation Forum. Workshop
Publisher : Springer Science & Business Media
Release : 2001-08-29
ISBN : 3540424466
Pages : 396 pages

Download or read book Cross Language Information Retrieval and Evaluation written by Cross-Language Evaluation Forum. Workshop and published by Springer Science & Business Media. This book was released on 2001-08-29 with total page 396 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book presents the thoroughly refereed post-proceedings of the international Cross-Language Evaluation Forum Workshop organized by the CLEF activity of the European DELOS Network of Excellence for Digital Libraries. The 25 revised papers presented together with an introduction were carefully selected based on two rounds of reviewing. All current aspects of cross-language information retrieval are addressed, ranging from foundational issues and systems evaluation to applications in a variety of fields.

Computers

Cross Language Information Retrieval and Evaluation

Book Details:

Author : Carol Peters
Publisher : Springer
Release : 2003-06-29
ISBN : 3540446451
Pages : 396 pages

Download or read book Cross Language Information Retrieval and Evaluation written by Carol Peters and published by Springer. This book was released on 2003-06-29 with total page 396 pages. Available in PDF, EPUB and Kindle. Book excerpt: The first evaluation campaign of the Cross-Language Evaluation Forum (CLEF) for European languages was held from January to September 2000. The campaign cul- nated in a two-day workshop in Lisbon, Portugal, 21 22 September, immediately following the fourth European Conference on Digital Libraries (ECDL 2000). The first day of the workshop was open to anyone interested in the area of Cross-Language Information Retrieval (CLIR) and addressed the topic of CLIR system evaluation. The goal was to identify the actual contribution of evaluation to system development and to determine what could be done in the future to stimulate progress. The second day was restricted to participants in the CLEF 2000 evaluation campaign and to their - periments. This volume constitutes the proceedings of the workshop and provides a record of the campaign. CLEF is currently an activity of the DELOS Network of Excellence for Digital - braries, funded by the EC Information Society Technologies to further research in digital library technologies. The activity is organized in collaboration with the US National Institute of Standards and Technology (NIST). The support of DELOS and NIST in the running of the evaluation campaign is gratefully acknowledged. I should also like to thank the other members of the Workshop Steering Committee for their assistance in the organization of this event.

Language Arts & Disciplines

Parallel Corpora for Contrastive and Translation Studies

Book Details:

Author : Irene Doval
Publisher : John Benjamins Publishing Company
Release : 2019-03-20
ISBN : 9027262845
Pages : 313 pages

Download or read book Parallel Corpora for Contrastive and Translation Studies written by Irene Doval and published by John Benjamins Publishing Company. This book was released on 2019-03-20 with total page 313 pages. Available in PDF, EPUB and Kindle. Book excerpt: This volume assesses the state of the art of parallel corpus research as a whole, reporting on advances in both recent developments of parallel corpora – with some particular references to comparable corpora as well– and in ways of exploiting them for a variety of purposes. The first part of the book is devoted to new roles that parallel corpora can and should assume in translation studies and in contrastive linguistics, to the usefulness and usability of parallel corpora, and to advances in parallel corpus alignment, annotation and retrieval. There follows an up-to-date presentation of a number of parallel corpus projects currently being carried out in Europe, some of them multimodal, with certain chapters illustrating case studies developed on the basis of the corpora at hand. In most of these chapters, attention is paid to specific technical issues of corpus building. The third part of the book reflects on specific applications and on the creation of bilingual resources from parallel corpora. This volume will be welcomed by scholars, postgraduate and PhD students in the fields of contrastive linguistics, translation studies, lexicography, language teaching and learning, machine translation, and natural language processing.

Computers

Comparable Corpora and Computer assisted Translation

Book Details:

Author : Estelle Maryline Delpech
Publisher : John Wiley & Sons
Release : 2014-07-22
ISBN : 1119002702
Pages : 221 pages

Download or read book Comparable Corpora and Computer assisted Translation written by Estelle Maryline Delpech and published by John Wiley & Sons. This book was released on 2014-07-22 with total page 221 pages. Available in PDF, EPUB and Kindle. Book excerpt: Computer-assisted translation (CAT) has always used translation memories, which require the translator to have a corpus of previous translations that the CAT software can use to generate bilingual lexicons. This can be problematic when the translator does not have such a corpus, for instance, when the text belongs to an emerging field. To solve this issue, CAT research has looked into the leveraging of comparable corpora, i.e. a set of texts, in two or more languages, which deal with the same topic but are not translations of one another. This work had two primary objectives. The first is to assess the input of lexicons extracted from comparable corpora in the context of a specialized human translation task. The second objective is to identify bilingual-lexicon-extraction methods which best match the translators' needs, determining the current limits of these techniques and suggesting improvements. The author focuses, in particular, on the identification of fertile translations, the management of multiple morphological structures, and the ranking of candidate translations. The experiments are carried out on two language pairs (English–French and English–German) and on specialized texts dealing with breast cancer. This research puts significant emphasis on applicability – methodological choices are guided by the needs of the final users. This book is organized in two parts: the first part presents the applicative and scientific context of the research, and the second part is given over to efforts to improve compositional translation. The research work presented in this book received the PhD Thesis award 2014 from the French association for natural language processing (ATALA).

Computers

Evaluation of Cross Language Information Retrieval Systems

Book Details:

Author : Martin Braschler
Publisher : Springer
Release : 2003-08-02
ISBN : 3540456910
Pages : 606 pages

Download or read book Evaluation of Cross Language Information Retrieval Systems written by Martin Braschler and published by Springer. This book was released on 2003-08-02 with total page 606 pages. Available in PDF, EPUB and Kindle. Book excerpt: The second evaluation campaign of the Cross Language Evaluation Forum (CLEF) for European languages was held from January to September 2001. This campaign proved a great success, and showed an increase in participation of around 70% com pared with CLEF 2000. It culminated in a two day workshop in Darmstadt, Germany, 3–4 September, in conjunction with the 5th European Conference on Digital Libraries (ECDL 2001). On the first day of the workshop, the results of the CLEF 2001 evalua tion campaign were reported and discussed in paper and poster sessions. The second day focused on the current needs of cross language systems and how evaluation cam paigns in the future can best be designed to stimulate progress. The workshop was attended by nearly 50 researchers and system developers from both academia and in dustry. It provided an important opportunity for researchers working in the same area to get together and exchange ideas and experiences. Copies of all the presentations are available on the CLEF web site at http://www. clef campaign. org. This volume con tains thoroughly revised and expanded versions of the papers presented at the work shop and provides an exhaustive record of the CLEF 2001 campaign. CLEF 2001 was conducted as an activity of the DELOS Network of Excellence for Digital Libraries, funded by the EC Information Society Technologies program to further research in digital library technologies. The activity was organized in collabo ration with the US National Institute of Standards and Technology (NIST).

Cross-language information retrieval

Comparable Corpora in Cross language Information Retrieval

Book Details:

Author : Tuomas Talvensaari
Publisher :
Release : 2008
ISBN :
Pages : 125 pages

Download or read book Comparable Corpora in Cross language Information Retrieval written by Tuomas Talvensaari and published by . This book was released on 2008 with total page 125 pages. Available in PDF, EPUB and Kindle. Book excerpt: Abstract: "Cross-language information retrieval (CLIR) enables users to express queries in a language different from the language of the documents to be retrieved. For example, a Finnish-speaking person could pose a query to a CLIR system in Finnish (the source language) to retrieve documents written in English (the target language). The language barrier is usually crossed by translating the query into the target language, after which the documents can be retrieved with the methods of monolingual information retrieval (IR). Aligned text collections (corpora) are common query translation resources in CLIR. A parallel corpus is a collection where texts in one language are aligned with their translations in another language. The aligned texts of a comparable corpus are more loosely related. They are not translations, but share topics and include common vocabulary in the two languages. Both kinds of corpora can be used to train statistical translation models, but parallel corpora are preferred because more dependable translation knowledge can be derived from them. However, parallel corpora do not exist for all language pairs and domains. Hence, it is sometimes necessary to resort to noisier comparable corpora. This thesis proposes new methods for the acquisition, alignment, and employment of comparable corpora. The acquisition method is based on language-aware focused web crawling, where web content written in specific languages and discussing specific topics of interest is obtained by employing the hyperlink structure of the web. In the alignment phase, the source language documents are used as CLIR queries to retrieve target language documents. The similarity of the query to the documents, and various other factors, are used as evidence to form alignments between the source and target language documents. The constructed corpora were employed in query translation as a cross-language similarity thesaurus, a structure where target language words are ranked based on their similarity with a source language word that is given as input. The highest ranking words are assumed to be either translations of the input word or related to it in some other manner. The methods were evaluated with extensive IR experiments that covered different language pairs, domains, and test data. The proposed CLIR approach was combined with approaches based on bilingual dictionaries. The combined approaches outperformed pure dictionary-based translation. In addition, the comparable corpus translation performed better in domain-specific CLIR than translation utilizing high-quality parallel corpora. This suggests that the proposed methods are particularly useful in domains where CLIR resources are scarce."

Computers

Using Comparable Corpora for Under Resourced Areas of Machine Translation

Book Details:

Author : Inguna Skadiņa
Publisher : Springer
Release : 2019-02-06
ISBN : 3319990047
Pages : 326 pages

Download or read book Using Comparable Corpora for Under Resourced Areas of Machine Translation written by Inguna Skadiņa and published by Springer. This book was released on 2019-02-06 with total page 326 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book provides an overview of how comparable corpora can be used to overcome the lack of parallel resources when building machine translation systems for under-resourced languages and domains. It presents a wealth of methods and open tools for building comparable corpora from the Web, evaluating comparability and extracting parallel data that can be used for the machine translation task. It is divided into several sections, each covering a specific task such as building, processing, and using comparable corpora, focusing particularly on under-resourced language pairs and domains. The book is intended for anyone interested in data-driven machine translation for under-resourced languages and domains, especially for developers of machine translation systems, computational linguists and language workers. It offers a valuable resource for specialists and students in natural language processing, machine translation, corpus linguistics and computer-assisted translation, and promotes the broader use of comparable corpora in natural language processing and computational linguistics.

Computers

Accessing Multilingual Information Repositories

Book Details:

Author : Fredric Gey
Publisher : Springer
Release : 2006-10-15
ISBN : 3540457003
Pages : 1032 pages

Download or read book Accessing Multilingual Information Repositories written by Fredric Gey and published by Springer. This book was released on 2006-10-15 with total page 1032 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book constitutes the thoroughly refereed postproceedings of the 6th Workshop of the Cross-Language Evaluation Forum, CLEF 2005. The book presents 111 revised papers together with an introduction. Topical sections include multilingual textual document retrieval, cross-language and more, monolingual experiments, domain-specific information retrieval, interactive cross-language information retrieval, multiple language question answering, cross-language retrieval in image collections, cross-language speech retrieval, multilingual Web track, cross-language geographical retrieval, and evaluation issues.

Computers

Advances in Cross Language Information Retrieval

Book Details:

Author : Cross-Language Evaluation Forum. Workshop
Publisher : Springer Science & Business Media
Release : 2003-10-10
ISBN : 3540408304
Pages : 832 pages

Download or read book Advances in Cross Language Information Retrieval written by Cross-Language Evaluation Forum. Workshop and published by Springer Science & Business Media. This book was released on 2003-10-10 with total page 832 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book presents the thoroughly refereed post-proceedings of a workshop by the Cross-Language Evaluation Forum Campaign, CLEF 2002, held in Rome, Italy in September 2002. The 43 revised full papers presented together with an introduction and run data in an appendix were carefully reviewed and revised upon presentation at the workshop. The papers are organized in topical sections on systems evaluation experiments, cross language and more, monolingual experiments, mainly domain-specific information retrieval, interactive issues, cross-language spoken document retrieval, and cross-language evaluation issues and initiatives.

Computers

Artificial Intelligence and Security

Book Details:

Author : Xingming Sun
Publisher : Springer Nature
Release : 2020-09-12
ISBN : 9811580839
Pages : 719 pages

Download or read book Artificial Intelligence and Security written by Xingming Sun and published by Springer Nature. This book was released on 2020-09-12 with total page 719 pages. Available in PDF, EPUB and Kindle. Book excerpt: The 3-volume set CCIS 1252 until CCIS 1254 constitutes the refereed proceedings of the 6th International Conference on Artificial Intelligence and Security, ICAIS 2020, which was held in Hohhot, China, in July 2020. The conference was formerly called “International Conference on Cloud Computing and Security” with the acronym ICCCS. The total of 178 full papers and 8 short papers presented in this 3-volume proceedings was carefully reviewed and selected from 1064 submissions. The papers were organized in topical sections as follows: Part I: artificial intelligence; Part II: artificial intelligence; Internet of things; information security; Part III: information security; big data and cloud computing; information processing.