[EBOOK] Invariant Features And Enhanced Speaker Normalization For Automatic Speech Recognition PDF Download

Computers

Invariant Features and Enhanced Speaker Normalization for Automatic Speech Recognition

Book Details:

Author : Florian Müller
Publisher : Logos Verlag Berlin GmbH
Release : 2013
ISBN : 3832533192
Pages : 247 pages

Download or read book Invariant Features and Enhanced Speaker Normalization for Automatic Speech Recognition written by Florian Müller and published by Logos Verlag Berlin GmbH. This book was released on 2013 with total page 247 pages. Available in PDF, EPUB and Kindle. Book excerpt: Automatic speech recognition systems have to handle various kinds of variabilities sufficiently well in order to achieve high recognition rates in practice. One of the variabilities that has a major impact on the performance is the vocal tract length of the speakers. Normalization of the features and adaptation of the acoustic models are commonly used methods in speech recognition systems. In contrast to that, a third approach follows the idea of extracting features with transforms that are invariant to vocal tract lengths changes. This work presents several approaches for extracting invariant features for automatic speech recognition systems. The robustness of these features under various training-test conditions is evaluated and it is described how the robustness of the features to noise can be increased. Furthermore, it is shown how the spectral effects due to different vocal tract lengths can be estimated with a registration method and how this can be used for speaker normalization.

Technology & Engineering

Automatic Speech and Speaker Recognition

Book Details:

Author : Chin-Hui Lee
Publisher : Springer Science & Business Media
Release : 1996-03-31
ISBN : 9780792397069
Pages : 548 pages

Download or read book Automatic Speech and Speaker Recognition written by Chin-Hui Lee and published by Springer Science & Business Media. This book was released on 1996-03-31 with total page 548 pages. Available in PDF, EPUB and Kindle. Book excerpt: Research in the field of automatic speech and speaker recognition has made a number of significant advances in the last two decades, influenced by advances in signal processing, algorithms, architectures, and hardware. These advances include: the adoption of a statistical pattern recognition paradigm; the use of the hidden Markov modeling framework to characterize both the spectral and the temporal variations in the speech signal; the use of a large set of speech utterance examples from a large population of speakers to train the hidden Markov models of some fundamental speech units; the organization of speech and language knowledge sources into a structural finite state network; and the use of dynamic, programming based heuristic search methods to find the best word sequence in the lexical network corresponding to the spoken utterance. Automatic Speech and Speaker Recognition: Advanced Topics groups together in a single volume a number of important topics on speech and speaker recognition, topics which are of fundamental importance, but not yet covered in detail in existing textbooks. Although no explicit partition is given, the book is divided into five parts: Chapters 1-2 are devoted to technology overviews; Chapters 3-12 discuss acoustic modeling of fundamental speech units and lexical modeling of words and pronunciations; Chapters 13-15 address the issues related to flexibility and robustness; Chapter 16-18 concern the theoretical and practical issues of search; Chapters 19-20 give two examples of algorithm and implementational aspects for recognition system realization. Audience: A reference book for speech researchers and graduate students interested in pursuing potential research on the topic. May also be used as a text for advanced courses on the subject.

Automatic speech recognition

Speaker Normalization for Improved Automatic Speech Recognition for Digital Libraries

Book Details:

Author : Wei Wang
Publisher :
Release : 2004
ISBN :
Pages : 122 pages

Download or read book Speaker Normalization for Improved Automatic Speech Recognition for Digital Libraries written by Wei Wang and published by . This book was released on 2004 with total page 122 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Computers

Fundamentals in Computer Understanding Speech and Vision

Book Details:

Author : Institut national de recherche en informatique et en automatique (France)
Publisher : CUP Archive
Release : 1987-05-07
ISBN : 9780521309837
Pages : 296 pages

Download or read book Fundamentals in Computer Understanding Speech and Vision written by Institut national de recherche en informatique et en automatique (France) and published by CUP Archive. This book was released on 1987-05-07 with total page 296 pages. Available in PDF, EPUB and Kindle. Book excerpt: Man-machine communication is presently undergoing an important evolution which is influenced both by technological advances and by the progress made in various fields such as signal processing, pattern recognition and artificial intelligence. This book emphasizes relevant aspects of man-machine dialogue by voice (acoustic-phonetic decoding, multi-speaker aspects, dialogue architectures, etc.) and presents analogies with the related fields of computer vision and natural language processing. It also introduces the fundamentals of knowledge-based and expert systems which are widely used in this field. The book is the result of an interdisciplinary collaboration of international experts who worked together for an advanced course sponsored by the Commission of the European Communities and Institut National de Recherche en Informatique et en Automatique. The course was held in Paris in May 1985.

Technology & Engineering

Robust Automatic Speech Recognition

Book Details:

Author : Jinyu Li
Publisher : Academic Press
Release : 2015-10-30
ISBN : 0128026162
Pages : 308 pages

Download or read book Robust Automatic Speech Recognition written by Jinyu Li and published by Academic Press. This book was released on 2015-10-30 with total page 308 pages. Available in PDF, EPUB and Kindle. Book excerpt: Robust Automatic Speech Recognition: A Bridge to Practical Applications establishes a solid foundation for automatic speech recognition that is robust against acoustic environmental distortion. It provides a thorough overview of classical and modern noise-and reverberation robust techniques that have been developed over the past thirty years, with an emphasis on practical methods that have been proven to be successful and which are likely to be further developed for future applications.The strengths and weaknesses of robustness-enhancing speech recognition techniques are carefully analyzed. The book covers noise-robust techniques designed for acoustic models which are based on both Gaussian mixture models and deep neural networks. In addition, a guide to selecting the best methods for practical applications is provided.The reader will: Gain a unified, deep and systematic understanding of the state-of-the-art technologies for robust speech recognition Learn the links and relationship between alternative technologies for robust speech recognition Be able to use the technology analysis and categorization detailed in the book to guide future technology development Be able to develop new noise-robust methods in the current era of deep learning for acoustic modeling in speech recognition The first book that provides a comprehensive review on noise and reverberation robust speech recognition methods in the era of deep neural networks Connects robust speech recognition techniques to machine learning paradigms with rigorous mathematical treatment Provides elegant and structural ways to categorize and analyze noise-robust speech recognition techniques Written by leading researchers who have been actively working on the subject matter in both industrial and academic organizations for many years

Computers

New Era for Robust Speech Recognition

Book Details:

Author : Shinji Watanabe
Publisher : Springer
Release : 2017-10-30
ISBN : 331964680X
Pages : 433 pages

Download or read book New Era for Robust Speech Recognition written by Shinji Watanabe and published by Springer. This book was released on 2017-10-30 with total page 433 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book covers the state-of-the-art in deep neural-network-based methods for noise robustness in distant speech recognition applications. It provides insights and detailed descriptions of some of the new concepts and key technologies in the field, including novel architectures for speech enhancement, microphone arrays, robust features, acoustic model adaptation, training data augmentation, and training criteria. The contributed chapters also include descriptions of real-world applications, benchmark tools and datasets widely used in the field. This book is intended for researchers and practitioners working in the field of speech processing and recognition who are interested in the latest deep learning techniques for noise robustness. It will also be of interest to graduate students in electrical engineering or computer science, who will find it a useful guide to this field of research.

Computers

Advances in Nonlinear Speech Processing

Book Details:

Author : Jordi Sole-Casals
Publisher : Springer
Release : 2010-03-10
ISBN : 3642115098
Pages : 209 pages

Download or read book Advances in Nonlinear Speech Processing written by Jordi Sole-Casals and published by Springer. This book was released on 2010-03-10 with total page 209 pages. Available in PDF, EPUB and Kindle. Book excerpt: This volume contains the proceedings of NOLISP 2009, an ISCA Tutorial and Workshop on Non-Linear Speech Processing held at the University of Vic (- talonia, Spain) during June 25-27, 2009. NOLISP2009wasprecededbythreeeditionsofthisbiannualeventheld2003 in Le Croisic (France), 2005 in Barcelona, and 2007 in Paris. The main idea of NOLISP workshops is to present and discuss new ideas, techniques and results related to alternative approaches in speech processing that may depart from the mainstream. In order to work at the front-end of the subject area, the following domains of interest have been de?ned for NOLISP 2009: 1. Non-linear approximation and estimation 2. Non-linear oscillators and predictors 3. Higher-order statistics 4. Independent component analysis 5. Nearest neighbors 6. Neural networks 7. Decision trees 8. Non-parametric models 9. Dynamics for non-linear systems 10. Fractal methods 11. Chaos modeling 12. Non-linear di?erential equations The initiative to organize NOLISP 2009 at the University of Vic (UVic) came from the UVic Research Group on Signal Processing and was supported by the Hardware-Software Research Group. We would like to acknowledge the ?nancial support obtained from the M- istry of Science and Innovation of Spain (MICINN), University of Vic, ISCA, and EURASIP. All contributions to this volume are original. They were subject to a doub- blind refereeing procedure before their acceptance for the workshop and were revised after being presented at NOLISP 2009.

Technology & Engineering

Automatic Speech and Speaker Recognition

Book Details:

Author : Joseph Keshet
Publisher : John Wiley & Sons
Release : 2009-04-27
ISBN : 9780470742037
Pages : 268 pages

Download or read book Automatic Speech and Speaker Recognition written by Joseph Keshet and published by John Wiley & Sons. This book was released on 2009-04-27 with total page 268 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book discusses large margin and kernel methods for speech and speaker recognition Speech and Speaker Recognition: Large Margin and Kernel Methods is a collation of research in the recent advances in large margin and kernel methods, as applied to the field of speech and speaker recognition. It presents theoretical and practical foundations of these methods, from support vector machines to large margin methods for structured learning. It also provides examples of large margin based acoustic modelling for continuous speech recognizers, where the grounds for practical large margin sequence learning are set. Large margin methods for discriminative language modelling and text independent speaker verification are also addressed in this book. Key Features: Provides an up-to-date snapshot of the current state of research in this field Covers important aspects of extending the binary support vector machine to speech and speaker recognition applications Discusses large margin and kernel method algorithms for sequence prediction required for acoustic modeling Reviews past and present work on discriminative training of language models, and describes different large margin algorithms for the application of part-of-speech tagging Surveys recent work on the use of kernel approaches to text-independent speaker verification, and introduces the main concepts and algorithms Surveys recent work on kernel approaches to learning a similarity matrix from data This book will be of interest to researchers, practitioners, engineers, and scientists in speech processing and machine learning fields.

Neural Network Based Representation Learning and Modeling for Speech and Speaker Recognition

Book Details:

Author : Jinxi Guo
Publisher :
Release : 2019
ISBN :
Pages : 127 pages

Download or read book Neural Network Based Representation Learning and Modeling for Speech and Speaker Recognition written by Jinxi Guo and published by . This book was released on 2019 with total page 127 pages. Available in PDF, EPUB and Kindle. Book excerpt: Deep learning and neural network research has grown significantly in the fields of automatic speech recognition (ASR) and speaker recognition. Compared to traditional methods, deep learning-based approaches are more powerful in learning representation from data and building complex models. In this dissertation, we focus on representation learning and modeling using neural network-based approaches for speech and speaker recognition. In the first part of the dissertation, we present two novel neural network-based methods to learn speaker-specific and phoneme-invariant features for short-utterance speaker verification. We first propose to learn a spectral feature mapping from each speech signal to the corresponding subglottal acoustic signal which has less phoneme variation, using deep neural networks (DNNs). The estimated subglottal features show better speaker-separation ability and provide complementary information when combined with traditional speech features on speaker verification tasks. Additional, we propose another DNN-based mapping model, which maps the speaker representation extracted from short utterances to the speaker representation extracted from long utterances of the same speaker. Two non-linear regression models using an autoencoder are proposed to learn this mapping, and they both improve speaker verification performance significantly. In the second part of the dissertation, we design several new neural network models which take raw speech features (either complex Discrete Fourier Transform (DFT) features or raw waveforms) as input, and perform the feature extraction and phone classification jointly. We first propose a unified deep Highway (HW) network with a time-delayed bottleneck layer (TDB), in the middle, for feature extraction. The TDB-HW networks with complex DFT features as input provide significantly lower error rates compared with hand-designed spectrum features on large-scale keyword spotting tasks. Next, we present a 1-D Convolutional Neural Network (CNN) model, which takes raw waveforms as input and uses convolutional layers to do hierarchical feature extraction. The proposed 1-D CNN model outperforms standard systems with hand-designed features. In order to further reduce the redundancy of the 1-D CNN model, we propose a filter sampling and combination (FSC) technique, which can reduce the model size by 70% and still improve the performance on ASR tasks. In the third part of dissertation, we propose two novel neural-network models for sequence modeling. We first propose an attention mechanism for acoustic sequence modeling. The attention mechanism can automatically predict the importance of each time step and select the most important information from sequences. Secondly, we present a sequence-to-sequence based spelling correction model for end-to-end ASR. The proposed correction model can effectively correct errors made by the ASR systems.

Speaker Normalisation for Automatic Speech Recognition

Book Details:

Author : David Henry Deterding
Publisher :
Release : 1990
ISBN :
Pages : pages

Download or read book Speaker Normalisation for Automatic Speech Recognition written by David Henry Deterding and published by . This book was released on 1990 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt:

Robust Automatic Speech Recognition Employing Phoneme dependent Multi environment Enhanced Models Based Linear Normalization

Book Details:

Author : Igmar Hernández Ochoa
Publisher :
Release : 2006
ISBN :
Pages : pages

Download or read book Robust Automatic Speech Recognition Employing Phoneme dependent Multi environment Enhanced Models Based Linear Normalization written by Igmar Hernández Ochoa and published by . This book was released on 2006 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: This work shows a robust normalization technique by cascading a speech enhance-ment method followed by a feature vector normalization algorithm. An efficient scheme used to provide speech enhancement is the Spectral Subtraction algorithm, which reduces the effect of additive noise by performing a subtraction of noise spectrum estimate over the complete speech spectrum. On the other hand, a new and promising technique known as PD-MEMLIN (Phoneme-Dependent Multi-Enviroment Models based Linear Normalization) has also shown to be effective. PD-MEMLIN is an empirical feature vector normalization which models clean and noisy spaces by Gaussian Mixture Models (GMMs), and estimates the different compensation linear transformation to be per-formed to clean the signal. In this work the integration of both approaches is proposed. The final design is called PD-MEEMLIN (Phoneme-Dependent Multi-Enviroment Enhanced Models based Linear Normalization), which confirms and improves the effectiv-ness of both approaches. The results obtained show that in very high degraded speech (between -5dB and OdB) PD-MEEMLIN outperforms the SS by a range between 11.4% and 34.5%,for PD-MEMLIN by a range between 11.7% and 24.84%, and for SPLICE by a range between 6.04% and 22.23%. Furthemore, in moderate SNR, i.e. 15 or 20 dB, PD-MEEMLIN is as good as PD-MEMLIN and SS techniques.

Automatic speech recognition

Robust Automatic Speech Recognition by Integrating Speech Separation

Book Details:

Author : Peidong Wang
Publisher :
Release : 2021
ISBN :
Pages : 138 pages

Download or read book Robust Automatic Speech Recognition by Integrating Speech Separation written by Peidong Wang and published by . This book was released on 2021 with total page 138 pages. Available in PDF, EPUB and Kindle. Book excerpt: Automatic speech recognition (ASR) has been used in many real-world applications such as smart speakers and meeting transcription. It converts speech waveform to text, making it possible for computers to understand and process human speech. When deployed to scenarios with severe noise or multiple speakers, the performance of ASR degrades by large margins. Robust ASR refers to the research field that addresses such performance degradation. Conventionally, the robustness of ASR models to background noise is improved by cascading speech enhancement frontends and ASR backends. This approach introduces distortions to speech signals that can render speech enhancement useless or even harmful for ASR. As for the robustness of ASR models to speech overlaps, traditional frontends cannot use speaker profiles efficiently. In this dissertation, we investigate the integration of ASR backends with speech separation (including speech enhancement and speaker separation) frontends. We start our work by improving the performance of acoustic models in ASR. We propose an utterance-wise recurrent dropout method for a recurrent neural network (RNN) based acoustic model. With utterance-wise context better exploited, the word error rate (WER) reduces substantially. We also propose an iterative speaker adaptation method that can adapt the acoustic model to different speakers using the ASR output from the previous iteration. To obtain a better trade-off between noise reduction and speech distortion for robust monaural (i.e. single-channel) ASR, we train the acoustic model with a large variety of enhanced speech generated by a monaural speech enhancement model. This way, the influence of speech distortion to ASR can be alleviated. We then investigate the use of different types of enhanced features for distortion-independent acoustic modeling. Using distortion-independent acoustic modeling with magnitude features as input, we obtain the state-of-the-art results on the second CHiME speech separation and recognition (CHiME-2) corpus. Multi-channel speech enhancement typically introduces less distortion than monaural speech enhancement. We first substitute the summation operation in beamforming with a learnable complex domain convolutional layer. Operations in complex domain leverage both magnitude and phase information. We then combine this complex domain idea and a two-stage beamforming approach. The first stage extracts spatial features, and the second stage uses both extracted spatial features and the original spectral features as input. This way, the second stage exploits spatial and spectral features explicitly. Using the proposed method, we achieve the state-of-the-art result on the 4th CHiME speech separation and recognition challenge (CHiME-4) corpus. While the enhancement of noisy speech leverages the differences between speech and noise in time-frequency (T-F) patterns, the separation of overlapped speech needs to use speaker-related information. We investigate speaker separation using an inventory of speaker profiles containing speaker identity information. We first select the speaker profiles involved in overlapped speech using an attention-based method. The selected speaker profiles are then used together with the original overlapped speech as input for speaker separation. To alleviate the problem caused by wrong speaker profile selection, we propose to use the output of speaker separation as selected speaker profiles for more iterations of speaker separation. Finally, speech contains sensitive personal data that users may not want to send to cloud-based servers for processing. Next-generation ASR systems should not only be robust to adverse conditions but also lightweight so that they can be deployed on-device. We investigate model compression methods for ASR that do not need model retraining. Our proposed weight sharing based model compression method achieves 9-fold compression with negligible performance degradation.

Normalization in the Acoustic Feature Space for Improved Speech Recognition

Book Details:

Author : Sirko Molau
Publisher :
Release : 2003
ISBN :
Pages : 0 pages

Download or read book Normalization in the Acoustic Feature Space for Improved Speech Recognition written by Sirko Molau and published by . This book was released on 2003 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Computers

Advances in Computational Collective Intelligence

Book Details:

Author : Ngoc Thanh Nguyen
Publisher : Springer Nature
Release : 2023-09-21
ISBN : 3031417747
Pages : 779 pages

Download or read book Advances in Computational Collective Intelligence written by Ngoc Thanh Nguyen and published by Springer Nature. This book was released on 2023-09-21 with total page 779 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book constitutes the refereed proceedings of the 15th International Conference on Advances in Computational Collective Intelligence, ICCCI 2023, held in Budapest, Hungary, during September 27–29, 2023. The 59 full papers included in this book were carefully reviewed and selected from 218 submissions. They were organized in topical sections as follows: Collective Intelligence and Collective Decision-Making, Deep Learning Techniques, Natural Language Processing, Data Minning and Machine learning, Social Networks and Speek Communication, Cybersecurity and Internet of Things, Cooperative Strategies for Decision Making and Optimization, Digital Content Understanding and Apllication for Industry 4.0 and Computational Intelligence in Medical Applications.

Technology & Engineering

Acoustical and Environmental Robustness in Automatic Speech Recognition

Book Details:

Author : Alex Acero
Publisher : Springer Science & Business Media
Release : 1992-11-30
ISBN : 9780792392842
Pages : 216 pages

Download or read book Acoustical and Environmental Robustness in Automatic Speech Recognition written by Alex Acero and published by Springer Science & Business Media. This book was released on 1992-11-30 with total page 216 pages. Available in PDF, EPUB and Kindle. Book excerpt: The need for automatic speech recognition systems to be robust with respect to changes in their acoustical environment has become more widely appreciated in recent years, as more systems are finding their way into practical applications. Although the issue of environmental robustness has received only a small fraction of the attention devoted to speaker independence, even speech recognition systems that are designed to be speaker independent frequently perform very poorly when they are tested using a different type of microphone or acoustical environment from the one with which they were trained. The use of microphones other than a "close talking" headset also tends to severely degrade speech recognition -performance. Even in relatively quiet office environments, speech is degraded by additive noise from fans, slamming doors, and other conversations, as well as by the effects of unknown linear filtering arising reverberation from surface reflections in a room, or spectral shaping by microphones or the vocal tracts of individual speakers. Speech-recognition systems designed for long-distance telephone lines, or applications deployed in more adverse acoustical environments such as motor vehicles, factory floors, oroutdoors demand far greaterdegrees ofenvironmental robustness. There are several different ways of building acoustical robustness into speech recognition systems. Arrays of microphones can be used to develop a directionally-sensitive system that resists intelference from competing talkers and other noise sources that are spatially separated from the source of the desired speech signal.

Mathematics

Automatic Speech Analysis and Recognition

Book Details:

Author : Jean-Paul Haton
Publisher : Springer Science & Business Media
Release : 2012-12-06
ISBN : 9400978790
Pages : 373 pages

Download or read book Automatic Speech Analysis and Recognition written by Jean-Paul Haton and published by Springer Science & Business Media. This book was released on 2012-12-06 with total page 373 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book is the result of the second NATO Advanced Study Institute on speech processing held at the Chateau de Bonas, France, from June 29th to July 10th, 1981. This Institute provided a high-level coverage of the fields of speech transmission, recognition and understanding, which constitute important areas where research activity has re cently been associated with actual industrial developments. This book will therefore include both fundamental and applied topics. Ten survey papers by some of the best specialists in the field are included. They give an up-to-date presentation of several important problems in automatic speech processing. As a consequence the book can be considered as a reference manual on some important areas of automatic speech processing. The surveys are indicated by 'a * in the table of contents. This book also contains research papers corresponding to original works, which were presented during the panel sessions of the Institute. For the sake of clarity the book has been divided into five sections : 1. Speech Analysis and Transmission: An emphasis has been laid on the techniques of linear prediction (LPC), and the problems involved in the transmission of speech at various bit rates are addressed in details. 2. Acoustics and Phonetics : One'of the major bottleneck in the development of speech recogni tion systems remains the transcription of the continuous speech wave into some discrete strings or lattices of phonetic symbols. Two survey papers discuss this problem from different points of view and several practical systems are also described.

Technology & Engineering

Automatic Speech and Speaker Recognition

Book Details:

Author : Chin-Hui Lee
Publisher : Springer Science & Business Media
Release : 2012-12-06
ISBN : 1461313678
Pages : 524 pages

Download or read book Automatic Speech and Speaker Recognition written by Chin-Hui Lee and published by Springer Science & Business Media. This book was released on 2012-12-06 with total page 524 pages. Available in PDF, EPUB and Kindle. Book excerpt: Research in the field of automatic speech and speaker recognition has made a number of significant advances in the last two decades, influenced by advances in signal processing, algorithms, architectures, and hardware. These advances include: the adoption of a statistical pattern recognition paradigm; the use of the hidden Markov modeling framework to characterize both the spectral and the temporal variations in the speech signal; the use of a large set of speech utterance examples from a large population of speakers to train the hidden Markov models of some fundamental speech units; the organization of speech and language knowledge sources into a structural finite state network; and the use of dynamic, programming based heuristic search methods to find the best word sequence in the lexical network corresponding to the spoken utterance. Automatic Speech and Speaker Recognition: Advanced Topics groups together in a single volume a number of important topics on speech and speaker recognition, topics which are of fundamental importance, but not yet covered in detail in existing textbooks. Although no explicit partition is given, the book is divided into five parts: Chapters 1-2 are devoted to technology overviews; Chapters 3-12 discuss acoustic modeling of fundamental speech units and lexical modeling of words and pronunciations; Chapters 13-15 address the issues related to flexibility and robustness; Chapter 16-18 concern the theoretical and practical issues of search; Chapters 19-20 give two examples of algorithm and implementational aspects for recognition system realization. Audience: A reference book for speech researchers and graduate students interested in pursuing potential research on the topic. May also be used as a text for advanced courses on the subject.