[EBOOK] Automatic Speech Recognition For Low Resource Languages And Accents Using Multilingual And Crosslingual Information PDF Download

Automatic Speech Recognition for Low resource Languages and Accents Using Multilingual and Crosslingual Information

Book Details:

Author : Ngoc Thang Vu
Publisher :
Release : 2014
ISBN : 9783844028928
Pages : 203 pages

Download or read book Automatic Speech Recognition for Low resource Languages and Accents Using Multilingual and Crosslingual Information written by Ngoc Thang Vu and published by . This book was released on 2014 with total page 203 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Computers

Automatic Speech Recognition and Translation for Low Resource Languages

Book Details:

Author : L. Ashok Kumar
Publisher : John Wiley & Sons
Release : 2024-03-28
ISBN : 1394214170
Pages : 428 pages

Download or read book Automatic Speech Recognition and Translation for Low Resource Languages written by L. Ashok Kumar and published by John Wiley & Sons. This book was released on 2024-03-28 with total page 428 pages. Available in PDF, EPUB and Kindle. Book excerpt: AUTOMATIC SPEECH RECOGNITION and TRANSLATION for LOW-RESOURCE LANGUAGES This book is a comprehensive exploration into the cutting-edge research, methodologies, and advancements in addressing the unique challenges associated with ASR and translation for low-resource languages. Automatic Speech Recognition and Translation for Low Resource Languages contains groundbreaking research from experts and researchers sharing innovative solutions that address language challenges in low-resource environments. The book begins by delving into the fundamental concepts of ASR and translation, providing readers with a solid foundation for understanding the subsequent chapters. It then explores the intricacies of low-resource languages, analyzing the factors that contribute to their challenges and the significance of developing tailored solutions to overcome them. The chapters encompass a wide range of topics, ranging from both the theoretical and practical aspects of ASR and translation for low-resource languages. The book discusses data augmentation techniques, transfer learning, and multilingual training approaches that leverage the power of existing linguistic resources to improve accuracy and performance. Additionally, it investigates the possibilities offered by unsupervised and semi-supervised learning, as well as the benefits of active learning and crowdsourcing in enriching the training data. Throughout the book, emphasis is placed on the importance of considering the cultural and linguistic context of low-resource languages, recognizing the unique nuances and intricacies that influence accurate ASR and translation. Furthermore, the book explores the potential impact of these technologies in various domains, such as healthcare, education, and commerce, empowering individuals and communities by breaking down language barriers. Audience The book targets researchers and professionals in the fields of natural language processing, computational linguistics, and speech technology. It will also be of interest to engineers, linguists, and individuals in industries and organizations working on cross-lingual communication, accessibility, and global connectivity.

Exploiting Resources from Closely related Languages for Automatic Speech Recognition in Low resource Languages from Malaysia

Book Details:

Author : Sarah Flora Samson Juan
Publisher :
Release : 2015
ISBN :
Pages : 0 pages

Download or read book Exploiting Resources from Closely related Languages for Automatic Speech Recognition in Low resource Languages from Malaysia written by Sarah Flora Samson Juan and published by . This book was released on 2015 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Languages in Malaysia are dying in an alarming rate. As of today, 15 languages are in danger while two languages are extinct. One of the methods to save languages is by documenting languages, but it is a tedious task when performed manually.Automatic Speech Recognition (ASR) system could be a tool to help speed up the process of documenting speeches from the native speakers. However, building ASR systems for a target language requires a large amount of training data as current state-of-the-art techniques are based on empirical approach. Hence, there are many challenges in building ASR for languages that have limited data available.The main aim of this thesis is to investigate the effects of using data from closely-related languages to build ASR for low-resource languages in Malaysia. Past studies have shown that cross-lingual and multilingual methods could improve performance of low-resource ASR. In this thesis, we try to answer several questions concerning these approaches: How do we know which language is beneficial for our low-resource language? How does the relationship between source and target languages influence speech recognition performance? Is pooling language data an optimal approach for multilingual strategy?Our case study is Iban, an under-resourced language spoken in Borneo island. We study the effects of using data from Malay, a local dominant language which is close to Iban, for developing Iban ASR under different resource constraints. We have proposed several approaches to adapt Malay data to obtain pronunciation and acoustic models for Iban speech.Building a pronunciation dictionary from scratch is time consuming, as one needs to properly define the sound units of each word in a vocabulary. We developed a semi-supervised approach to quickly build a pronunciation dictionary for Iban. It was based on bootstrapping techniques for improving Malay data to match Iban pronunciations.To increase the performance of low-resource acoustic models we explored two acoustic modelling techniques, the Subspace Gaussian Mixture Models (SGMM) and Deep Neural Networks (DNN). We performed cross-lingual strategies using both frameworks for adapting out-of-language data to Iban speech. Results show that using Malay data is beneficial for increasing the performance of Iban ASR. We also tested SGMM and DNN to improve low-resource non-native ASR. We proposed a fine merging strategy for obtaining an optimal multi-accent SGMM. In addition, we developed an accent-specific DNN using native speech data. After applying both methods, we obtained significant improvements in ASR accuracy. From our study, we observe that using SGMM and DNN for cross-lingual strategy is effective when training data is very limited.

Multilingual Techniques for Low Resource Automatic Speech Recognition

Book Details:

Author : Ekapol Chuangsuwanich
Publisher :
Release : 2016
ISBN :
Pages : 143 pages

Download or read book Multilingual Techniques for Low Resource Automatic Speech Recognition written by Ekapol Chuangsuwanich and published by . This book was released on 2016 with total page 143 pages. Available in PDF, EPUB and Kindle. Book excerpt: Out of the approximately 7000 languages spoken around the world, there are only about 100 languages with Automatic Speech Recognition (ASR) capability. This is due to the fact that a vast amount of resources is required to build a speech recognizer. This often includes thousands of hours of transcribed speech data, a phonetic pronunciation dictionary or lexicon which spans all words in the language, and a text collection on the order of several million words. Moreover, ASR technologies usually require years of research in order to deal with the specific idiosyncrasies of each language. This makes building a speech recognizer on a language with few resources a daunting task. In this thesis, we propose a universal ASR framework for transcription and keyword spotting (KWS) tasks that work on a variety of languages. We investigate methods to deal with the need of a pronunciation dictionary by using a Pronunciation Mixture Model that can learn from existing lexicons and acoustic data to generate pronunciation for new words. In the case when no dictionary is available, a graphemic lexicon provides comparable performance to the expert lexicon. To alleviate the need for text corpora, we investigate the use of subwords and web data which helps im- prove KWS spotting results. Finally, we reduce the need for speech recordings by using bottleneck (BN) features trained on multilingual corpora. We first propose the Low-rank Stacked Bottleneck architecture which improves ASR performance over previous state-of-the-art systems. We then investigate a method to select data from various languages that is most similar to the target language in a data-driven manner, which helps improve the eectiveness of the BN features. Using techniques described and proposed in this thesis, we are able to more than double the KWS performance for a low-resource language compared to using standard techniques geared towards rich resource domains.

Automatic speech recognition

Data Augmentation for Automatic Speech Recognition for Low Resource Languages

Book Details:

Author : Ronit Damania
Publisher :
Release : 2021
ISBN :
Pages : 37 pages

Download or read book Data Augmentation for Automatic Speech Recognition for Low Resource Languages written by Ronit Damania and published by . This book was released on 2021 with total page 37 pages. Available in PDF, EPUB and Kindle. Book excerpt: "In this thesis, we explore several novel data augmentation methods for improving the performance of automatic speech recognition (ASR) on low-resource languages. Using a 100-hour subset of English LibriSpeech to simulate a low-resource setting, we compare the well-known SpecAugment augmentation approach to these new methods, along with several other competitive baselines. We then apply the most promising combinations of models and augmentation methods to three genuinely under-resourced languages using the 40-hour Gujarati, Tamil, Telugu datasets from the 2021 Interspeech Low Resource Automatic Speech Recognition Challenge for Indian Languages. Our data augmentation approaches, coupled with state-of-the-art acoustic model architectures and language models, yield reductions in word error rate over SpecAugment and other competitive baselines for the LibriSpeech-100 dataset, showing a particular advantage over prior models for the ``other'', more challenging, dev and test sets. Extending this work to the low-resource Indian languages, we see large improvements over the baseline models and results comparable to large multilingual models."--Abstract.

Automatic speech recognition

Cross lingual Language Modeling for Low resource Speech Recognition

Book Details:

Author : Ping Xu
Publisher :
Release : 2012
ISBN :
Pages : 69 pages

Download or read book Cross lingual Language Modeling for Low resource Speech Recognition written by Ping Xu and published by . This book was released on 2012 with total page 69 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Technology & Engineering

Advances in Electronics Engineering

Book Details:

Author : Zahriladha Zakaria
Publisher : Springer Nature
Release : 2019-12-16
ISBN : 9811512892
Pages : 332 pages

Download or read book Advances in Electronics Engineering written by Zahriladha Zakaria and published by Springer Nature. This book was released on 2019-12-16 with total page 332 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book presents the proceedings of ICCEE 2019, held in Kuala Lumpur, Malaysia, on 29th–30th April 2019. It includes the latest advances in electrical engineering and electronics from leading experts around the globe.

Multilingual Vocabularies in Automatic Speech Recognition

Book Details:

Author :
Publisher :
Release : 2000
ISBN :
Pages : 5 pages

Download or read book Multilingual Vocabularies in Automatic Speech Recognition written by and published by . This book was released on 2000 with total page 5 pages. Available in PDF, EPUB and Kindle. Book excerpt: The paper describes a method for dealing with multilingual vocabularies in speech recognition tasks. We present an approach that combines acoustic descriptive precision and capability of generalization to multiple languages. The approach is based on the concept of classes of transitions between phones. The classes are defined by means of objective measures on acoustic similarities among sounds of different languages. This procedure stems from the definition of a general language-independent model. When a new language is to be added, the phonological structure of the language is mapped onto the set of classes belonging to the general model. Successively, if a limited amount of language-specific speech data becomes available for the new language, we identify those sounds which require the definition of additional classes. The experiments have been conducted in Italian, English and Spanish languages. The method can also be considered as a way of implementing cross-lingual porting of recognition models for a rapid prototyping of recognizers in a new target language, specifically in cases whereby the collection of large training databases would be economically infeasible.

Cross language Acoustic Adaptation for Automatic Speech Recognition

Book Details:

Author : Christoph Nieuwoudt
Publisher :
Release : 2013
ISBN :
Pages : pages

Download or read book Cross language Acoustic Adaptation for Automatic Speech Recognition written by Christoph Nieuwoudt and published by . This book was released on 2013 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Speech recognition systems have been developed for the major languages of the world, yet for the majority of languages there are currently no large vocabulary continuous speech recognition (LVCSR) systems. The development of an LVCSR system for a new language is very costly, mainly because a large speech database has to be compiled to robustly capture the acoustic characteristics of the new language. This thesis investigates techniques that enable the re-use of acoustic information from a source language, in which a large amount of data is available, in implementing a system for a new target language. The assumption is that too little data is available in the target language to train a robust speech recognition system on that data alone, and that use of acoustic information from a source language can improve the performance of a target language recognition system. Strategies for cross-language use of acoustic information are proposed, including training on pooled source and target language data, adaptation of source language models using target language data, adapting multilingual models using target language data and transforming source language data to augment target language data for model training. These strategies are allied with Bayesian and transformation-based techniques, usually used for speaker adaptation, as well as with discriminative learning techniques, to present a framework for cross-language re-use of acoustic information. Extensions to current adaptation techniques are proposed to improve the performance of these techniques specifically for cross-language adaptation. A new technique for transformation-based adaptation of variance parameters and a cost-based extension of the minimum classification error (MCE) approach are proposed. Experiments are performed for a large number of approaches from the proposed framework for cross-language re-use of acoustic information. Relatively large amounts of English speech data are used in conjunction with smaller amounts of Afrikaans speech data to improve the performance of an Afrikaans speech recogniser. Results indicate that a significant reduction in word error rate (between 26% and 50%, depending on the amount of Afrikaans data available) is possible when English acoustic data is used in addition to Afrikaans speech data from the same database (i.e both sets of data were recorded under the same c1̀2onditions and the same labelling process was used). For same-database experiments, best results are achieved for approaches that train models on pooled source and target language data and then perform further adaptation of the models using Bayesian or discriminative techniques on target language data only. Experiments are also performed to evaluate the use of English data from a different database than the Afrikaans data. Peak reductions in word error rate of between 16% and 35% are delivered, depending on the amount of Afrikaans data available. Best results are achieved for an approach that performs a simple transformation of source model parameters using target language data, and then performs Bayesian adaptation of the transformed model on target language data.

Computers

Speech to Speech Translation

Book Details:

Author : Yutaka Kidawara
Publisher : Springer Nature
Release : 2019-11-22
ISBN : 9811505950
Pages : 103 pages

Download or read book Speech to Speech Translation written by Yutaka Kidawara and published by Springer Nature. This book was released on 2019-11-22 with total page 103 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book provides the readers with retrospective and prospective views with detailed explanations of component technologies, speech recognition, language translation and speech synthesis. Speech-to-speech translation system (S2S) enables to break language barriers, i.e., communicate each other between any pair of person on the glove, which is one of extreme dreams of humankind. People, society, and economy connected by S2S will demonstrate explosive growth without exception. In 1986, Japan initiated basic research of S2S, then the idea spread world-wide and were explored deeply by researchers during three decades. Now, we see S2S application on smartphone/tablet around the world. Computational resources such as processors, memories, wireless communication accelerate this computation-intensive systems and accumulation of digital data of speech and language encourage recent approaches based on machine learning. Through field experiments after long research in laboratories, S2S systems are being well-developed and now ready to utilized in daily life. Unique chapter of this book is end-2-end evaluation by comparing system’s performance and human competence. The effectiveness of the system would be understood by the score of this evaluation. The book will end with one of the next focus of S2S will be technology of simultaneous interpretation for lecture, broadcast news and so on.

Computers

Robust Adaptation to Non Native Accents in Automatic Speech Recognition

Book Details:

Author : Silke Goronzy
Publisher : Springer
Release : 2003-07-01
ISBN : 3540362908
Pages : 135 pages

Download or read book Robust Adaptation to Non Native Accents in Automatic Speech Recognition written by Silke Goronzy and published by Springer. This book was released on 2003-07-01 with total page 135 pages. Available in PDF, EPUB and Kindle. Book excerpt: Speech recognition technology is being increasingly employed in human-machine interfaces. A remaining problem however is the robustness of this technology to non-native accents, which still cause considerable difficulties for current systems. In this book, methods to overcome this problem are described. A speaker adaptation algorithm that is capable of adapting to the current speaker with just a few words of speaker-specific data based on the MLLR principle is developed and combined with confidence measures that focus on phone durations as well as on acoustic features. Furthermore, a specific pronunciation modelling technique that allows the automatic derivation of non-native pronunciations without using non-native data is described and combined with the previous techniques to produce a robust adaptation to non-native accents in an automatic speech recognition system.

Towards Multilingual Interoperability in Automatic Speech Recognition

Book Details:

Author :
Publisher :
Release : 2000
ISBN :
Pages : 9 pages

Download or read book Towards Multilingual Interoperability in Automatic Speech Recognition written by and published by . This book was released on 2000 with total page 9 pages. Available in PDF, EPUB and Kindle. Book excerpt: In this communication, we address multilingual interoperability aspects in speech recognition. After giving a tentative definition of multilingual interoperability, we discuss speech recognition components and their language-specific aspects. We give a sample overview of past multilingual speech recognition research and development across different speaking styles (read, prepared and conversational). The problem of adaptation to new languages is addressed. Language-independent and cross- language techniques for acoustic modeling provide a means to port recognition systems to new languages without language specific acoustic data. Pronunciation lexical and text material appear to be the most crucial language-dependent resources for porting. Fast porting being a step towards multilingual interoperability the ongoing efforts of producing multilingual pronun ciation lexical and collecting multilingual text corpora should be extended to the largest possible number of written languages.

Computers

Speech and Computer

Book Details:

Author : S. R. Mahadeva Prasanna
Publisher : Springer Nature
Release : 2022-11-12
ISBN : 303120980X
Pages : 737 pages

Download or read book Speech and Computer written by S. R. Mahadeva Prasanna and published by Springer Nature. This book was released on 2022-11-12 with total page 737 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book constitutes the proceedings of the 24th International Conference on Speech and Computer, SPECOM 2022, held as a hybrid event in Gurugram, India, in November 2022. The 51 full and 9 short papers presented in this volume were carefully reviewed and selected from 99 submissions. The papers present current research in the area of computer speech processing including audio signal processing, automatic speech recognition, speaker recognition, computational paralinguistics, speech synthesis, sign language and multimodal processing, and speech and language resources.

Multilingual and crosslingual acoustic modelling for automatic speech recognition

Book Details:

Author : Frank Diehl
Publisher :
Release : 2007
ISBN :
Pages : 172 pages

Download or read book Multilingual and crosslingual acoustic modelling for automatic speech recognition written by Frank Diehl and published by . This book was released on 2007 with total page 172 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Automatic speech recognition

Multilingual Speech Recognition

Book Details:

Author : Ulla Uebler
Publisher :
Release : 2000
ISBN : 9783897225022
Pages : 186 pages

Download or read book Multilingual Speech Recognition written by Ulla Uebler and published by . This book was released on 2000 with total page 186 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Robust Speech Recognition for Low resource Languages

Book Details:

Author : Aleksei Romanenko
Publisher :
Release : 2020
ISBN :
Pages : pages

Download or read book Robust Speech Recognition for Low resource Languages written by Aleksei Romanenko and published by . This book was released on 2020 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt:

Real World Approaches for Multilingual and Non native Speech Recognition

Book Details:

Author : Martin Raab
Publisher :
Release : 2010
ISBN : 9783832524463
Pages : 0 pages

Download or read book Real World Approaches for Multilingual and Non native Speech Recognition written by Martin Raab and published by . This book was released on 2010 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: In theory multiple languages can be recognized just as one language. However, current state of the art speech recognition systems are based on statistical models with many parameters. Extending such models to multiple languages requires more resources. Therefore a lot of research in the area of multilingual speech recognition has proposed techniques to reduce this need for more resources through parameter tying across languages. This work shows that tying at the density level of Hidden Markov Model based speech recognizers offers the greatest flexibility for the design of a multilingual acoustic model. Furthermore, new algorithms are designed and tested for a fast and efficient creation of systems for many different language combinations. These algorithms base on the addition of only relevant Gaussians and on the projection of a Gaussian mixture distribution to new sets of Gaussians. The positive aspects of the architecture proposed in this work are that non-native accent recognition fruitfully applies knowledge about the mother language of the speakers and that an optimal resource allocation for each language can be guaranteed through an online adaptation to the current tasks.