[EBOOK] Statistical Optimization Of Acoustic Models For Large Vocabulary Speech Recognition PDF Download

Electronic dissertations

Statistical Optimization of Acoustic Models for Large Vocabulary Speech Recognition

Book Details:

Author : Rusheng Hu
Publisher :
Release : 2006
ISBN :
Pages : pages

Download or read book Statistical Optimization of Acoustic Models for Large Vocabulary Speech Recognition written by Rusheng Hu and published by . This book was released on 2006 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: This dissertation investigates optimization of acoustic models in speech recognition. Two new optimization methods are proposed for phonetic decision tree (PDT) search and Hidden Markov modeling (HMM)-- the knowledge-based adaptive PDT algorithm and the HMM gradient boosting algorithm. Investigations are conducted to applying both methods to improve word error rate of the state-of-the-art speech recognition system. However, these two methods are developed in a general machine learning background and their applications are not limited to speech recognition. The HMM gradient boosting method is based on a function approximation scheme from the perspective of optimization in function space rather than the parameter space, based on the fact that the Gaussian mixture model in each HMM state is an additive model of homogeneous functions (Gaussians). It provides a new scheme which can jointly optimize model structure and parameters. Experiments are conducted on the World Street Journal (WSJ) task and good improvements on word error rate are observed. The knowledge-based adaptive PDT algorithm is developed under a trend toward knowledge-based systems and aims at optimizing the mapping from contextual phones to articulatory states by maximizing implicit usage of the phonological and phonetic information, which is presumed to be contained in large data corpus. A computational efficient algorithm is developed to incorporate this prior knowledge in PDT construction. This algorithm is evaluated on the Telehealth conversational speech recognition and significant improvement on system performance is achieved.

Language Arts & Disciplines

Statistical Methods for Speech Recognition

Book Details:

Author : Frederick Jelinek
Publisher : MIT Press
Release : 1998-01-15
ISBN : 9780262100663
Pages : 324 pages

Download or read book Statistical Methods for Speech Recognition written by Frederick Jelinek and published by MIT Press. This book was released on 1998-01-15 with total page 324 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book reflects decades of important research on the mathematical foundations of speech recognition. It focuses on underlying statistical techniques such as hidden Markov models, decision trees, the expectation-maximization algorithm, information theoretic goodness criteria, maximum entropy probability estimation, parameter and data clustering, and smoothing of probability distributions. The author's goal is to present these principles clearly in the simplest setting, to show the advantages of self-organization from real data, and to enable the reader to apply the techniques.

Reducing Development Costs of Large Vocabulary Speech Recognition Systems

Book Details:

Author : Thiago Fraga Da Silva
Publisher :
Release : 2014
ISBN :
Pages : 0 pages

Download or read book Reducing Development Costs of Large Vocabulary Speech Recognition Systems written by Thiago Fraga Da Silva and published by . This book was released on 2014 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: One of the outstanding challenges in large vocabulary automatic speech recognition (ASR) is the reduction of development costs required to build a new recognition system or adapt an existing one to a new task, language or dialect. The state-of-the-art ASR systems are based on the principles of the statistical learning paradigm, using information provided by two stochastic models, an acoustic (AM) and a language (LM) model. The standard methods used to estimate the parameters of such models are founded on two main assumptions : the training data sets are large enough, and the training data match well the target task. It is well-known that a great part of system development costs is due to the construction of corpora that fulfill these requirements. In particular, manually transcribing the audio data is the most expensive and time-consuming endeavor. For some applications, such as the recognition of low resourced languages or dialects, finding and collecting data is also a hard (and expensive) task. As a means to lower the cost required for ASR system development, this thesis proposes and studies methods that aim to alleviate the need for manually transcribing audio data for a given target task. Two axes of research are explored. First, unsupervised training methods are explored in order to build three of the main components of ASR systems : the acoustic model, the multi-layer perceptron (MLP) used to extract acoustic features and the language model. The unsupervised training methods aim to estimate the model parameters using a large amount of automatically (and inaccurately) transcribed audio data, obtained thanks to an existing recognition system. A novel method for unsupervised AM training that copes well with the automatic audio transcripts is proposed : the use of multiple recognition hypotheses (rather than the best one) leads to consistent gains in performance over the standard approach. Unsupervised MLP training is proposed as an alternative to build efficient acoustic models in a fully unsupervised way. Compared to cross-lingual MLPs trained in a supervised manner, the unsupervised MLP leads to competitive performance levels even if trained on only about half of the data amount. Unsupervised LM training approaches are proposed to estimate standard back-off n-gram and neural network language models. It is shown that unsupervised LM training leads to additive gains in performance on top of unsupervised AM training. Second, this thesis proposes the use of model interpolation as a rapid and flexible way to build task specific acoustic models. In reported experiments, models obtained via interpolation outperform the baseline pooled models and equivalent maximum a posteriori (MAP) adapted models. Interpolation proves to be especially useful for low resourced dialect ASR. When only a few (2 to 3 hours) or no acoustic data truly matching the target dialect are available for AM training, model interpolation leads to substantial performance gains compared to the standard training methods.

Technology & Engineering

Dynamic Speech Models

Book Details:

Author : Li Deng
Publisher : Springer Nature
Release : 2022-05-31
ISBN : 3031025555
Pages : 105 pages

Download or read book Dynamic Speech Models written by Li Deng and published by Springer Nature. This book was released on 2022-05-31 with total page 105 pages. Available in PDF, EPUB and Kindle. Book excerpt: Speech dynamics refer to the temporal characteristics in all stages of the human speech communication process. This speech “chain” starts with the formation of a linguistic message in a speaker's brain and ends with the arrival of the message in a listener's brain. Given the intricacy of the dynamic speech process and its fundamental importance in human communication, this monograph is intended to provide a comprehensive material on mathematical models of speech dynamics and to address the following issues: How do we make sense of the complex speech process in terms of its functional role of speech communication? How do we quantify the special role of speech timing? How do the dynamics relate to the variability of speech that has often been said to seriously hamper automatic speech recognition? How do we put the dynamic process of speech into a quantitative form to enable detailed analyses? And finally, how can we incorporate the knowledge of speech dynamics into computerized speech analysis and recognition algorithms? The answers to all these questions require building and applying computational models for the dynamic speech process. What are the compelling reasons for carrying out dynamic speech modeling? We provide the answer in two related aspects. First, scientific inquiry into the human speech code has been relentlessly pursued for several decades. As an essential carrier of human intelligence and knowledge, speech is the most natural form of human communication. Embedded in the speech code are linguistic (as well as para-linguistic) messages, which are conveyed through four levels of the speech chain. Underlying the robust encoding and transmission of the linguistic messages are the speech dynamics at all the four levels. Mathematical modeling of speech dynamics provides an effective tool in the scientific methods of studying the speech chain. Such scientific studies help understand why humans speak as they do and how humans exploit redundancy and variability by way of multitiered dynamic processes to enhance the efficiency and effectiveness of human speech communication. Second, advancement of human language technology, especially that in automatic recognition of natural-style human speech is also expected to benefit from comprehensive computational modeling of speech dynamics. The limitations of current speech recognition technology are serious and are well known. A commonly acknowledged and frequently discussed weakness of the statistical model underlying current speech recognition technology is the lack of adequate dynamic modeling schemes to provide correlation structure across the temporal speech observation sequence. Unfortunately, due to a variety of reasons, the majority of current research activities in this area favor only incremental modifications and improvements to the existing HMM-based state-of-the-art. For example, while the dynamic and correlation modeling is known to be an important topic, most of the systems nevertheless employ only an ultra-weak form of speech dynamics; e.g., differential or delta parameters. Strong-form dynamic speech modeling, which is the focus of this monograph, may serve as an ultimate solution to this problem. After the introduction chapter, the main body of this monograph consists of four chapters. They cover various aspects of theory, algorithms, and applications of dynamic speech models, and provide a comprehensive survey of the research work in this area spanning over past 20~years. This monograph is intended as advanced materials of speech and signal processing for graudate-level teaching, for professionals and engineering practioners, as well as for seasoned researchers and engineers specialized in speech processing

Efficient Setup of Acoustic Models for Large Vocabulary Continuous Speech Recognition

Book Details:

Author : Christian Gollan
Publisher :
Release : 2014
ISBN :
Pages : 0 pages

Download or read book Efficient Setup of Acoustic Models for Large Vocabulary Continuous Speech Recognition written by Christian Gollan and published by . This book was released on 2014 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Technology & Engineering

Automatic Speech and Speaker Recognition

Book Details:

Author : Joseph Keshet
Publisher : John Wiley & Sons
Release : 2009-04-27
ISBN : 9780470742037
Pages : 268 pages

Download or read book Automatic Speech and Speaker Recognition written by Joseph Keshet and published by John Wiley & Sons. This book was released on 2009-04-27 with total page 268 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book discusses large margin and kernel methods for speech and speaker recognition Speech and Speaker Recognition: Large Margin and Kernel Methods is a collation of research in the recent advances in large margin and kernel methods, as applied to the field of speech and speaker recognition. It presents theoretical and practical foundations of these methods, from support vector machines to large margin methods for structured learning. It also provides examples of large margin based acoustic modelling for continuous speech recognizers, where the grounds for practical large margin sequence learning are set. Large margin methods for discriminative language modelling and text independent speaker verification are also addressed in this book. Key Features: Provides an up-to-date snapshot of the current state of research in this field Covers important aspects of extending the binary support vector machine to speech and speaker recognition applications Discusses large margin and kernel method algorithms for sequence prediction required for acoustic modeling Reviews past and present work on discriminative training of language models, and describes different large margin algorithms for the application of part-of-speech tagging Surveys recent work on the use of kernel approaches to text-independent speaker verification, and introduces the main concepts and algorithms Surveys recent work on kernel approaches to learning a similarity matrix from data This book will be of interest to researchers, practitioners, engineers, and scientists in speech processing and machine learning fields.

Technology & Engineering

Statistical Pronunciation Modeling for Non Native Speech Processing

Book Details:

Author : Rainer E. Gruhn
Publisher : Springer Science & Business Media
Release : 2011-05-08
ISBN : 3642195865
Pages : 118 pages

Download or read book Statistical Pronunciation Modeling for Non Native Speech Processing written by Rainer E. Gruhn and published by Springer Science & Business Media. This book was released on 2011-05-08 with total page 118 pages. Available in PDF, EPUB and Kindle. Book excerpt: In this work, the authors present a fully statistical approach to model non--native speakers' pronunciation. Second-language speakers pronounce words in multiple different ways compared to the native speakers. Those deviations, may it be phoneme substitutions, deletions or insertions, can be modelled automatically with the new method presented here. The methods is based on a discrete hidden Markov model as a word pronunciation model, initialized on a standard pronunciation dictionary. The implementation and functionality of the methodology has been proven and verified with a test set of non-native English in the regarding accent. The book is written for researchers with a professional interest in phonetics and automatic speech and speaker recognition.

Technology & Engineering

Incorporating Knowledge Sources into Statistical Speech Recognition

Book Details:

Author : Sakriani Sakti
Publisher : Springer Science & Business Media
Release : 2009-02-27
ISBN : 038785830X
Pages : 207 pages

Download or read book Incorporating Knowledge Sources into Statistical Speech Recognition written by Sakriani Sakti and published by Springer Science & Business Media. This book was released on 2009-02-27 with total page 207 pages. Available in PDF, EPUB and Kindle. Book excerpt: Incorporating Knowledge Sources into Statistical Speech Recognition addresses the problem of developing efficient automatic speech recognition (ASR) systems, which maintain a balance between utilizing a wide knowledge of speech variability, while keeping the training / recognition effort feasible and improving speech recognition performance. The book provides an efficient general framework to incorporate additional knowledge sources into state-of-the-art statistical ASR systems. It can be applied to many existing ASR problems with their respective model-based likelihood functions in flexible ways.

Automatic speech recognition

Statistical Language Modelling for Large Vocabulary Speech Recognition

Book Details:

Author : Michael McGreevy
Publisher :
Release : 2005
ISBN :
Pages : 75 pages

Download or read book Statistical Language Modelling for Large Vocabulary Speech Recognition written by Michael McGreevy and published by . This book was released on 2005 with total page 75 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Automatic speech recognition

Speech Recognition Using Neural Networks

Book Details:

Author : Joe Tebelskis
Publisher :
Release : 1995
ISBN :
Pages : 180 pages

Download or read book Speech Recognition Using Neural Networks written by Joe Tebelskis and published by . This book was released on 1995 with total page 180 pages. Available in PDF, EPUB and Kindle. Book excerpt: Abstract: "This thesis examines how artificial neural networks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modeling. Despite their state-of-the-art performance, HMMs make a number of suboptimal modeling assumptions that limit their potential effectiveness. Neural networks avoid many of these assumptions, while they can also learn complex functions, generalize effectively, tolerate noise, and support parallelism. While neural networks can readily be applied to acoustic modeling, it is not yet clear how they can be used for temporal modeling. Therefore, we explore a class of systems called NN-HMM hybrids, in which neural networks perform acoustic modeling, and HMMs perform temporal modeling. We argue that a NN-HMM hybrid has several theoretical advantages over a pure HMM system, including better acoustic modeling accuracy, better context sensitivity, more natural discrimination, and a more economical use of parameters. These advantages are confirmed experimentally by a NN-HMM hybrid that we developed, based on context-independent phoneme models, that achieved 90.5% word accuracy on the Resource Management database, in contrast to only 86.0% accuracy achieved by a pure HMM under similar conditions. In the course of developing this system, we explored two different ways to use neural networks for acoustic modeling: prediction and classification. We found that predictive networks yield poor results because of a lack of discrimination, but classification networks gave excellent results. We verified that, in accordance with theory, the output activations of a classification network form highly accurate estimates of the posterior probabilities P(class/input), and we showed how these can easily be converted to likelihoods P(input/class) for standard HMM recognition algorithms. Finally, this thesis reports how we optimized the accuracy of our system with many natural techniques, such as expanding the input window size, normalizing the inputs, increasing the number of hidden units, converting the network's output activations to log likelihoods, optimizing the learning rate schedule by automatic search, backpropagating error from word level outputs, and using gender dependent networks."

Computers

Statistical Language and Speech Processing

Book Details:

Author : Nathalie Camelin
Publisher : Springer
Release : 2017-10-09
ISBN : 3319684566
Pages : 282 pages

Download or read book Statistical Language and Speech Processing written by Nathalie Camelin and published by Springer. This book was released on 2017-10-09 with total page 282 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book constitutes the refereed proceedings of the 5th International Conference on Statistical Language and Speech Processing, SLSP 2017, held in Le Mans, France, in October 2017. The 21 full papers presented were carefully reviewed and selected from 39 submissions. The papers cover topics such as anaphora and conference resolution; authorship identification, plagiarism and spam filtering; computer-aided translation; corpora and language resources; data mining and semanticweb; information extraction; information retrieval; knowledge representation and ontologies; lexicons and dictionaries; machine translation; multimodal technologies; natural language understanding; neural representation of speech and language; opinion mining and sentiment analysis; parsing; part-of-speech tagging; question and answering systems; semantic role labeling; speaker identification and verification; speech and language generation; speech recognition; speech synthesis; speech transcription; speech correction; spoken dialogue systems; term extraction; text categorization; test summarization; user modeling. They are organized in the following sections: language and information extraction; post-processing and applications of automatic transcriptions; speech paralinguistics and synthesis; speech recognition: modeling and resources.

Hierarchical Connectionist Acoustic Modeling for Domain Adaptive Large Vocabulary Speech Recognition

Book Details:

Author : Jürgen Fritsch
Publisher :
Release : 2000
ISBN : 9783826573101
Pages : 216 pages

Download or read book Hierarchical Connectionist Acoustic Modeling for Domain Adaptive Large Vocabulary Speech Recognition written by Jürgen Fritsch and published by . This book was released on 2000 with total page 216 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Natural Statistical Models for Automatic Speech Recognition

Book Details:

Author : Jeff Bilmes
Publisher :
Release : 1999
ISBN :
Pages : 446 pages

Download or read book Natural Statistical Models for Automatic Speech Recognition written by Jeff Bilmes and published by . This book was released on 1999 with total page 446 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Technology & Engineering

Discriminative Learning for Speech Recognition

Book Details:

Author : Xiadong He
Publisher : Springer Nature
Release : 2022-06-01
ISBN : 3031025571
Pages : 112 pages

Download or read book Discriminative Learning for Speech Recognition written by Xiadong He and published by Springer Nature. This book was released on 2022-06-01 with total page 112 pages. Available in PDF, EPUB and Kindle. Book excerpt: In this book, we introduce the background and mainstream methods of probabilistic modeling and discriminative parameter optimization for speech recognition. The specific models treated in depth include the widely used exponential-family distributions and the hidden Markov model. A detailed study is presented on unifying the common objective functions for discriminative learning in speech recognition, namely maximum mutual information (MMI), minimum classification error, and minimum phone/word error. The unification is presented, with rigorous mathematical analysis, in a common rational-function form. This common form enables the use of the growth transformation (or extended Baum–Welch) optimization framework in discriminative learning of model parameters. In addition to all the necessary introduction of the background and tutorial material on the subject, we also included technical details on the derivation of the parameter optimization formulas for exponential-family distributions, discrete hidden Markov models (HMMs), and continuous-density HMMs in discriminative learning. Selected experimental results obtained by the authors in firsthand are presented to show that discriminative learning can lead to superior speech recognition performance over conventional parameter learning. Details on major algorithmic implementation issues with practical significance are provided to enable the practitioners to directly reproduce the theory in the earlier part of the book into engineering practice. Table of Contents: Introduction and Background / Statistical Speech Recognition: A Tutorial / Discriminative Learning: A Unified Objective Function / Discriminative Learning Algorithm for Exponential-Family Distributions / Discriminative Learning Algorithm for Hidden Markov Model / Practical Implementation of Discriminative Learning / Selected Experimental Results / Epilogue / Major Symbols Used in the Book and Their Descriptions / Mathematical Notation / Bibliography

Technology & Engineering

Automatic Speech Recognition

Book Details:

Author : Dong Yu
Publisher : Springer
Release : 2014-11-11
ISBN : 1447157796
Pages : 329 pages

Download or read book Automatic Speech Recognition written by Dong Yu and published by Springer. This book was released on 2014-11-11 with total page 329 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book provides a comprehensive overview of the recent advancement in the field of automatic speech recognition with a focus on deep learning models including deep neural networks and many of their variants. This is the first automatic speech recognition book dedicated to the deep learning approach. In addition to the rigorous mathematical treatment of the subject, the book also presents insights and theoretical foundation of a series of highly successful deep learning models.

Technology & Engineering

Language Modeling for Automatic Speech Recognition of Inflective Languages

Book Details:

Author : Gregor Donaj
Publisher : Springer
Release : 2016-08-29
ISBN : 3319416073
Pages : 77 pages

Download or read book Language Modeling for Automatic Speech Recognition of Inflective Languages written by Gregor Donaj and published by Springer. This book was released on 2016-08-29 with total page 77 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book covers language modeling and automatic speech recognition for inflective languages (e.g. Slavic languages), which represent roughly half of the languages spoken in Europe. These languages do not perform as well as English in speech recognition systems and it is therefore harder to develop an application with sufficient quality for the end user. The authors describe the most important language features for the development of a speech recognition system. This is then presented through the analysis of errors in the system and the development of language models and their inclusion in speech recognition systems, which specifically address the errors that are relevant for targeted applications. The error analysis is done with regard to morphological characteristics of the word in the recognized sentences. The book is oriented towards speech recognition with large vocabularies and continuous and even spontaneous speech. Today such applications work with a rather small number of languages compared to the number of spoken languages.

Computers

Statistical Language and Speech Processing

Book Details:

Author : Thierry Dutoit
Publisher : Springer
Release : 2018-10-08
ISBN : 303000810X
Pages : 191 pages

Download or read book Statistical Language and Speech Processing written by Thierry Dutoit and published by Springer. This book was released on 2018-10-08 with total page 191 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book constitutes the proceedings of the 6th International Conference on Statistical Language and Speech Processing, SLSP 2018, held in Mons, Belgium, in October 2018. The 15 full papers presented in this volume were carefully reviewed and selected from 40 submissions. They were organized in topical sections named: speech synthesis and spoken language generation; speech recognition and post-processing; natural language processing and understanding; and text processing and analysis.