On Variable Sampling Frequencies In Speech Recognition

        In this paper we describe a novel approcah to address the issue of differenc sampling frequencies in speech recognition. In general, when a recognition task needs a differenc sampling frequency from that of the reference system, it is customary to re-train the system for the new sampling rate. To circumvent the tedious training process, we propose a new approach termed Sampling Rate Transformation (SRT) to perform the transformation directly on speech recognition system. By re-scaling the mel-filter design and filtering the system in spectrum domain, SRT converts the existing system to the target spectral range. New systems are obtained without using any data from the test environment. Preliminary experiments show that SRT reduces the word error rate from 29.89% to 18.17% given 11KHz test data and a 16KHz system. The matched system for 11KHz has an error rate of 16.17%. We also examing MLLR and MAP. The best result from MLLR is 17.92% with 4.5 hours of speech. In the speaker adaptation mode, SRT reduces the error rate from 15.48% ti 9.71% given 11KHz test data and a 16KHz SA system while the matched 11KHz SA system has an error rate of 9.33%

By: Fu-Hua Liu, Michael Picheny

Published in: RC21261 in 1998

This Research Report is not available electronically. Please request a copy from the contact listed below. IBM employees should contact ITIRC for a copy.

Questions about this service can be mailed to reports@us.ibm.com .