EBookClubs

Read Books & Download eBooks Full Online

EBookClubs

Read Books & Download eBooks Full Online

Book Improvement of Decoding Engine   Phonetic Decision Tree in Acoustic Modeling for Online Large Vocabulary Conversational Speech Recognition

Download or read book Improvement of Decoding Engine Phonetic Decision Tree in Acoustic Modeling for Online Large Vocabulary Conversational Speech Recognition written by Jian Xue and published by . This book was released on 2007 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: In this work, new approaches are proposed for online large vocabulary conversational speech recognition, including a fast confusion network algorithm, novel features and a Random Forests based classifier for word confidence annotation, new improvements in speech decoding speed and latency, novel lookahead phonetic decision tree state tying and Random Forests of phonetic decision tree state tying for acoustic modeling of speech sound units. The fast confusion network algorithm significantly improves the time complexity from O(T3) to O(T), with T equaling the number of links in a word lattice. Several novel features, as well as Random Forests based classification technique are proposed to improve word annotation accuracy for automatic captioning. In order to improve the speed of speech decoding engine, we propose to use complementary word confidence scores to prune uncompetitive search paths, and use subspace distribution clustering hidden Markov modeling to speed up computation of acoustic scores and local confidence scores. We further integrate pre-backtrace in decoding search to significantly reduce captioning latency. In this work we also investigate novel approaches to improve the performance of phonetic decision tree state tying, including two lookahead methods and a Random Forests method. Constrained lookahead method finds an optimal question among n pre-selected questions for each split node to decrease effects of outliers, and it also discounts the contributions of likelihood gains by deeper decedents. Stochastic full lookahead method uses sub-tree size instead of likelihood gain as a measure for phonetic question selection, in order to produce small trees with better generalization capability and consistent with training data. The Random Forests method uses an ensemble of phonetic decision trees to derive a single strong model for each speech unit. We investigate several methods of combining the acoustic scores from multiple models obtained from multiple phonetic decision trees in decoding search. We further propose clustering methods to compact the Random Forests generated acoustic models to speed up decoding search.

Book Ensemble Methods in Large Vocabulary Continuous Speech Recognition

Download or read book Ensemble Methods in Large Vocabulary Continuous Speech Recognition written by Xin Chen and published by . This book was released on 2008 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Combining a group of classifiers and therefore improving the overall classification performance is a young and promising direction in Large Vocabulary Continuous Speech Recognition (LVCSR). Previous works on acoustic modeling of speech signals such as Random Forests (RFs) of Phonetic Decision Trees (PDTs) has produced significant improvements in word recognition accuracy. In this thesis, several new ensemble approaches are proposed for LVCSR and experimental evaluations have shown absolute accuracy gains up to 2.3% over the conventional PDT-based acoustic models in our telehealth conversational speech recognition task. The word accuracy performance improvement achieved in this thesis work is significant and the techniques have been integrated in the telemedicine automatic captioning system developed by the SLIPL group of the University of Missouri--Columbia.

Book Statistical Optimization of Acoustic Models for Large Vocabulary Speech Recognition

Download or read book Statistical Optimization of Acoustic Models for Large Vocabulary Speech Recognition written by Rusheng Hu and published by . This book was released on 2006 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: This dissertation investigates optimization of acoustic models in speech recognition. Two new optimization methods are proposed for phonetic decision tree (PDT) search and Hidden Markov modeling (HMM)-- the knowledge-based adaptive PDT algorithm and the HMM gradient boosting algorithm. Investigations are conducted to applying both methods to improve word error rate of the state-of-the-art speech recognition system. However, these two methods are developed in a general machine learning background and their applications are not limited to speech recognition. The HMM gradient boosting method is based on a function approximation scheme from the perspective of optimization in function space rather than the parameter space, based on the fact that the Gaussian mixture model in each HMM state is an additive model of homogeneous functions (Gaussians). It provides a new scheme which can jointly optimize model structure and parameters. Experiments are conducted on the World Street Journal (WSJ) task and good improvements on word error rate are observed. The knowledge-based adaptive PDT algorithm is developed under a trend toward knowledge-based systems and aims at optimizing the mapping from contextual phones to articulatory states by maximizing implicit usage of the phonological and phonetic information, which is presumed to be contained in large data corpus. A computational efficient algorithm is developed to incorporate this prior knowledge in PDT construction. This algorithm is evaluated on the Telehealth conversational speech recognition and significant improvement on system performance is achieved.

Book Pronunciation Modeling for Large Vocabulary Speech Recognition

Download or read book Pronunciation Modeling for Large Vocabulary Speech Recognition written by Arthur Kantor and published by . This book was released on 2011 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: The large pronunciation variability of words in conversational speech is one of the major causes of low accuracy in automatic speech recognition (ASR). Many pronunciation modeling approaches have been developed to address this problem. Some explicitly manipulate the pronunciation dictionary as well as the set of the units used to define the pronunciations of words. Other approaches model the pronunciation implicitly by using long duration acoustical context to more accurately classify the spoken pronunciation unit. This thesis is a study of the relative ability of the acoustic and the pronunciation models to capture pronunciation variability in a nearly state of the art conversational telephone speech recognizer. Several methods are tested, each designed to improve the modeling accuracy of the recognizer. Some of the experiments result in a lower word error rate, but many do not, apparently because, in different ways, the accuracy gained by one part of the recognizer comes at the expense of accuracy lost or transferred from another part of the recognizer. Pronunciation variability is modeled with two approaches: from above with explicit pronunciation modeling and from below with implicit pronunciation modeling within the acoustic model. Both approaches make use of long duration context, explicitly by considering long-duration pronunciation units and implicitly by having the acoustic model consider long-duration speech segments. Some pronunciation models address the pronunciation variability problem by introducing multiple pronunciations per word to cover more variants observed in conversational speech. However, this can potentially increase the confusability between words. This thesis studies the relationship between pronunciation perplexity and the lexical ambiguity, which has informed the design of the explicit pronunciation models presented here.

Book Subphonetic Acoustic Modeling for Speaker independent Continuous Speech Recognition

Download or read book Subphonetic Acoustic Modeling for Speaker independent Continuous Speech Recognition written by and published by . This book was released on 1993 with total page 176 pages. Available in PDF, EPUB and Kindle. Book excerpt: To model the acoustics of a large vocabulary well while staying within a reasonable memory capacity, most speech recognition systems use phonetic models to share parameters across different words in the vocabulary. This dissertation investigates the merits of modeling at the subphonetic level. We demonstrate that sharing parameters at the subphonetic level provides more accurate acoustic models than sharing at the phonetic level. The concept of subphonetic parameter sharing can be applied to any class of parametric models. Since the first-order hidden Markov model (HMM) has been overwhelmingly successful in speech recognition, this dissertation bases all its studies and experiments on HMMs. The subphonetic unit we investigate is the state of phonetic HMMs. We develop a system in which similar Markov states of phonetic models share the same Markov parameters. The shared parameter (i.e., the output distribution) associated with a cluster of similar states is called a senone because of its state dependency. The phonetic models that share senones are shared-distribution models or SDMs. Experiments show that SDMs offer more accurate acoustic models than the generalized-triphone model presented by Lee. Senones are next applied to offer accurate models for triphones not experienced in the system training data. In this dissertation, two approaches for modeling unseen triphones are studied - purely decision-tree based senones and a hybrid approach using the concept of Markov state quantization. Both approaches indeed offer a significant error reduction over the previously accepted approach of monophone model substitution. However, the purely decision-tree based senone approach is preferred for its simplicity.

Book Multi level Acoustic Modeling for Automatic Speech Recognition

Download or read book Multi level Acoustic Modeling for Automatic Speech Recognition written by Hung-An Chang (Ph. D.) and published by . This book was released on 2012 with total page 192 pages. Available in PDF, EPUB and Kindle. Book excerpt: Context-dependent acoustic modeling is commonly used in large-vocabulary Automatic Speech Recognition (ASR) systems as a way to model coarticulatory variations that occur during speech production. Typically, the local phoneme context is used as a means to define context-dependent units. Because the number of possible context-dependent units can grow exponentially with the length of the contexts, many units will not have enough training examples to train a robust model, resulting in a data sparsity problem. For nearly two decades, this data sparsity problem has been dealt with by a clustering-based framework which systematically groups different context-dependent units into clusters such that each cluster can have enough data. Although dealing with the data sparsity issue, the clustering-based approach also makes all context-dependent units within a cluster have the same acoustic score, resulting in a quantization effect that can potentially limit the performance of the context-dependent model. In this work, a multi-level acoustic modeling framework is proposed to address both the data sparsity problem and the quantization effect. Under the multi-level framework, each context-dependent unit is associated with classifiers that target multiple levels of contextual resolution, and the outputs of the classifiers are linearly combined for scoring during recognition. By choosing the classifiers judiciously, both the data sparsity problem and the quantization effect can be dealt with. The proposed multi-level framework can also be integrated into existing large-vocabulary ASR systems, such as FST-based ASR systems, and is compatible with state-of-the-art error reduction techniques for ASR systems, such as discriminative training methods. Multiple sets of experiments have been conducted to compare the performance of the clustering-based acoustic model and the proposed multi-level model. In a phonetic recognition experiment on TIMIT, the multi-level model has about 8% relative improvement in terms of phone error rate, showing that the multi-level framework can help improve phonetic prediction accuracy. In a large-vocabulary transcription task, combining the proposed multi-level modeling framework with discriminative training can provide more than 20% relative improvement over a clustering baseline model in terms of Word Error Rate (WER), showing that the multi-level framework can be integrated into existing large-vocabulary decoding frameworks and that it combines well with discriminative training methods. In speaker adaptive transcription task, the multi-level model has about 14% relative WER improvement, showing that the proposed framework can adapt better to new speakers, and potentially to new environments than the conventional clustering-based approach.

Book A Study on the Integration of Phonetic Landmarks Into Large Vocabulary Continuous Speech Decoding

Download or read book A Study on the Integration of Phonetic Landmarks Into Large Vocabulary Continuous Speech Decoding written by Stefan Ziegler and published by . This book was released on 2014 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: This thesis studies the integration of phonetic landmarks into standard statistical large vocabulary continuous speech recognition (LVCSR). Landmarks are discrete time instances that indicate the presence of phonetic events in the speech signal. The goal is to develop landmark detectors that are motivated by phonetic knowledge in order to model selected phonetic classes more precisely than it is possible with standard acoustic models. The thesis presents two landmark detection approaches, which make use of segment-based information and studies two different methods to integrate landmarks into the decoding, which are landmark-based pruning and a weighted combination approach. While both approaches improve speech recognition performance compared to the baseline using weighted combination of landmarks and acoustic scores during decoding, they do not outperform standard frame-based phonetic predictions. Since these results indicate that landmark-driven LVCSR requires the integration of very heterogeneous information, the thesis presents a third integration framework that is designed to integrate an arbitrary number of heterogeneous and asynchronous landmark streams into LVCSR. The results indicate that this framework is indeed ale to improve the baseline system, as soon as landmarks provide complementary information to the regular acoustic models.

Book Computational Models of American Speech

Download or read book Computational Models of American Speech written by M. Margaret Withgott and published by Center for the Study of Language (CSLI). This book was released on 1993 with total page 168 pages. Available in PDF, EPUB and Kindle. Book excerpt: A new perspective on phonetic variation is achieved in this volume through the construction of a series of models of spoken American English. In the past, computer theorists and programmers investigating pronunciation have often relied on their own knowledge of the language or on limited transcription data. Speech recognition researchers, on the other hand, have drawn on a great deal of data but without examining in detail the information about pronunciation the data contains. The authors combine the best of each approach to develop probabilistic and rule-based computational models of transcription data. An ongoing controversy in studies of phonetic variation is the existence and proper definition of a phonetic unit. The authors argue that assumptions about the units of spoken language are critical to a computational model. Their computational models employ suprasegmental elements such as syllable boundaries, stress, and position in a unit called a metrical foot. The use of such elements in modeling data enables the creation of better computational models for both recognition and synthesis technology. This book should be of interest to speech engineers, linguists, and anyone who wishes to understand symbolic systems of communication.

Book High Accuracy Large Vocabulary Speech Recognition Using Mixture Tying and Consistency Modeling

Download or read book High Accuracy Large Vocabulary Speech Recognition Using Mixture Tying and Consistency Modeling written by and published by . This book was released on 1994 with total page 7 pages. Available in PDF, EPUB and Kindle. Book excerpt: Improved acoustic modeling can significantly decrease the error rate in large-vocabulary speech recognition. Our approach to the problem is twofold. We first propose a scheme that optimizes the degree of mixture tying for a given amount of training data and computational resources. Experimental results on the Wall Street Journal (WSJ) Corpus show that this new form of output distribution achieves a 25% reduction in error rate over typical tied- mixture systems. We then show that an additional improvement can be achieved by modeling local time correlation with linear discriminant features.

Book Language Modeling for Automatic Speech Recognition of Inflective Languages

Download or read book Language Modeling for Automatic Speech Recognition of Inflective Languages written by Gregor Donaj and published by Springer. This book was released on 2016-08-29 with total page 77 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book covers language modeling and automatic speech recognition for inflective languages (e.g. Slavic languages), which represent roughly half of the languages spoken in Europe. These languages do not perform as well as English in speech recognition systems and it is therefore harder to develop an application with sufficient quality for the end user. The authors describe the most important language features for the development of a speech recognition system. This is then presented through the analysis of errors in the system and the development of language models and their inclusion in speech recognition systems, which specifically address the errors that are relevant for targeted applications. The error analysis is done with regard to morphological characteristics of the word in the recognized sentences. The book is oriented towards speech recognition with large vocabularies and continuous and even spontaneous speech. Today such applications work with a rather small number of languages compared to the number of spoken languages.

Book Real time Speaker independent Large Vocabulary Continuous Speech Recoginition

Download or read book Real time Speaker independent Large Vocabulary Continuous Speech Recoginition written by and published by . This book was released on 2005 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: In this dissertation, a real-time decoding engine for speaker-independent large vocabulary continuous speech recognition (LVCSR) is presented. Three indispensable and correlated performance measurements -- accuracy, speed, and memory cost, are carefully considered in the system design. A novel algorithm, Order-Preserving Language Model Context Pre-computing (OPCP) is proposed for fast Language Model (LM) lookup, resulting in significant improvement in both overall decoding time and memory space without any decrease of recognition accuracy. The time and memory savings in LM lookup by using OPCP became more pronounced with the increase of LM size. By using the OPCP method and other optimizations, our one-pass LVCSR decoding engine, named TigerEngine, reached real-time speed in both tasks of Wall Street Journal 20K and Switchboard 33K, on the platform of a Dell workstation with one 3.2 GHz Xeon CPU. TigerEngine is to be used in automatic captioning for Telehealth.

Book The Integration of Phonetic Knowledge in Speech Technology

Download or read book The Integration of Phonetic Knowledge in Speech Technology written by William J. Barry and published by Springer Science & Business Media. This book was released on 2006-03-30 with total page 188 pages. Available in PDF, EPUB and Kindle. Book excerpt: Continued progress in Speech Technology in the face of ever-increasing demands on the performance levels of applications is a challenge to the whole speech and language science community. Robust recognition and understanding of spontaneous speech in varied environments, good comprehensibility and naturalness of expressive speech synthesis are goals that cannot be achieved without a change of paradigm. This book argues for interdisciplinary communication and cooperation in problem-solving in general, and discusses the interaction between speech and language engineering and phonetics in particular. With a number of reports on innovative speech technology research as well as more theoretical discussions, it addresses the practical, scientific and sometimes the philosophical problems that stand in the way of cross-disciplinary collaboration and illuminates some of the many possible ways forward. Audience: Researchers and professionals in speech technology and computational linguists.

Book Data Driven Techniques in Speech Synthesis

Download or read book Data Driven Techniques in Speech Synthesis written by R.I. Damper and published by Springer Science & Business Media. This book was released on 2012-12-06 with total page 328 pages. Available in PDF, EPUB and Kindle. Book excerpt: This first review of a new field covers all areas of speech synthesis from text, ranging from text analysis to letter-to-sound conversion. At the leading edge of current research, the concise and accessible book is written by well respected experts in the field.

Book High Performance Speech Recognition Using Consistency Modeling

Download or read book High Performance Speech Recognition Using Consistency Modeling written by and published by . This book was released on 1993 with total page 11 pages. Available in PDF, EPUB and Kindle. Book excerpt: The goal of SRI's consistency modeling project is to improve the raw acoustic modeling component of SRI's DECIPHER speech recognition system and develop consistency modeling technology. Consistency modeling aims to reduce the number of improper independence assumptions used in traditional speech recognition algorithms so that the resulting speech recognition hypotheses are more self-consistent and, therefore, more accurate. At the initial stages of this effort, SRI focused on developing the appropriate base technologies for consistency modeling. We first developed the Progressive Search technology that allowed us to perform large-vocabulary continuous speech recognition (LVCSR) experiments. Since its conception and development at SRI, this technique has been adopted by most laboratories, including other ARPA contracting sites, doing research on LVSR. Another goal of the consistency modeling project is to attack difficult modeling problems, when there is a mismatch between the training and testing phases. Such mismatches may include outlier speakers, different microphones and additive noise. We were able to either develop new, or transfer and evaluate existing, technologies that adapted our baseline genonic HMM recognizer to such difficult conditions. (AN).

Book Aspects of Speech Recognition by Computer

Download or read book Aspects of Speech Recognition by Computer written by Pierre Jules Louis Edmond Vicens and published by . This book was released on 1969 with total page 250 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Adaptive Vocabularies in Large Vocabulary Conversational Speech Recognition

Download or read book Adaptive Vocabularies in Large Vocabulary Conversational Speech Recognition written by Petra Geutner and published by . This book was released on 2000 with total page 192 pages. Available in PDF, EPUB and Kindle. Book excerpt: