Fepstrum Features: Design and Application to Conversational Speech Recognition

In this paper, we present the Fepstrum features
– a principled approach to estimate the modulation spectrum
of the speech signals using the Hilbert envelopes in a nonparametric
way. The importance of the modulation spectrum
as a feature in the automatic speech recognition (ASR) has
long been established by several researchers in the past twothree
decades. However, traditionally, in the speech recognition
literature the modulation spectrum features have been extracted
as the DCT/DFT of the log Mel filter’s energies over 10 15
frames. These Mel-filter energies are in-turn computed through
short term spectrum (with 20 30ms long primary window).
We show, that this approach leads to a crude approximation of
the modulation spectrum in the Mel-filter bands. Further, we
show that the log of a particular Mel-Filter’s Hilbert envelope
(obtained over a primary analysis window of 100ms) leads
to a principled amplitude modulation (AM) signal estimate in
that band. Lower DCT coefficients (in the range 0 25Hz)
of the AM signal leads to the fepstrum features. To assess
the effectiveness of the fepstrum features, we have performed
conversational telephony speech (CTS) recognition experiments
on the Switchboard (SWB) corpus using a recently developed
LVCSR library (IBM IrlTK). Our experiments indicate that
the fepstrum features in simple concatenation with the shortterm
spectral envelope features (MFCC) provide upto 2.5%
absolute improvement in phoneme recognition accuracy and upto
2.5% 3.5% absolute word recognition accuracy improvement
on a 1.5Hr SWB test set with a 2, 300 words vocabulary. We
also provide the details of our IrlTK LVCSR acoustic modeling
library.
EDICS: SPE-ANLS,SPE-RECO,SPE-LVCR

By: Vivek Tyagi

Published in: RI11009 in 2011

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

IBM_ResearchReport_RI11009_VivekTyagi.pdf

Questions about this service can be mailed to reports@us.ibm.com .