Sparse Representation Phone Identification Features for Speech Recognition

Sparse representation techniques, such as Support Vector Machines (SVMs), k-nearest neighbor (kNN) and Bayesian Compressive Sensing (BCS), can be used to characterize a test sample from a few support training samples in a dictionary set. In this paper, we introduce a semi-gaussian constraint into the BCS formulation, which allows support parameters to be estimated using a closed-form iterative solution. We show that using this approach for phonetic classification allows for a higher accuracy than other non-parametric techniques. These phones are the basic units of speech to be recognized. Motivated by this result, we create a new dictionary which is a function of the phonetic labels of the original dictionary. The support vectors now select relevant samples from this new dictionary to create a new representation of the test sample, where the test sample is better linked to the actual units to be recognized. We present results using these new features in a Hidden Markov Model (HMM) framework for speech recognition. We find that these features allow for a Phonetic Error Rate (PER) of 23.9% on the TIMIT phonetic recognition task, the best result on TIMIT to date when HMM parameters are trained using the maximum likelihood principle.

By: Tara N. Sainath; David Nahamoo; Bhuvana Ramabhadran; Dimitri Kanevsky

Published in: RC24983 in 2010


