Automatic Speech Recognition Performance on a Voicemail Transcription Task

In this paper we report on the performance on ASR systems on a fairly difficult problem - that of transcribing voicemail. ASR systems comprise of several building blocks and we present several algorithms, each of which focuses on one of the building blocks. The algorithms address the aspects of lexicon design, feature extraction, hypothesis search and speaker adaptation. Though the techniques are benchmarked on voicemail test data, their scope is not restricted to this domain,
as they address fundamental aspects of the speech recognition process. We also report on the results of some cross domain experiments,that underline the "brittleness" of the speech recognition systems we use today and highlight the need to focus research attention on improving cross domain performance.

By: Mukund Padmanabhan, George A. Saon, Jing Huang, Brian Kingsbury, Lidia L. Mangu

Published in: RC22172 in 2001

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

RC22172.pdf

Questions about this service can be mailed to reports@us.ibm.com .