The ETSI Extended Distributed Speech Recognition (DSR) Standards: Server-Side Speech Reconstruction

In this paper we present work that has been carried out in developing the ETSI Extended DSR standards ES 202 211 and ES 202 212 [1][2]. These standards extend the previous ETSI DSR standards: basic front-end ES 201 108 and advanced (noise robust) front-end ES 202 050 respectively. The extensions enable enhanced tonal language recognition as well as server-side speech reconstruction capability. This paper discusses the server-side speech reconstruction whereas a companion paper discusses the front-end extension and tonal language recognition. Experimental results show that the reconstructed speech produced by the standards is highly intelligible under clean and noisy background conditions with the DRT (Diagnostic Rhyme Test) and TT (Transcription Test) scores meeting or exceeding the objective values corresponding to the US DoD (Department of Defence) Federal standard MELP (Mixed-Excitation Linear Predictive) coder operating at 2400 bps.

By: Tenkasi Ramabadran, Alexander Sorin, Michael McLaughlin, Dan Chazan, David Pearce, Ron Hoory

Published in: H-0200 in 2003

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

h-0200.pdf

Questions about this service can be mailed to reports@us.ibm.com .