Effects of Automated Transcription Quality on Non-native Speakers' Comprehension in Real-time Computer-mediated Communication

Real-time transcription has been shown to be valuable in facilitating non-native speakers’ comprehension in realtime communication. Automated speech recognition (ASR) technology is a critical ingredient for its practical deployment. This paper presents a series of studies investigating how the quality of transcripts generated by an ASR system impacts user comprehension and subjective evaluation. Experiments are first presented comparing performance across three different transcription conditions: no transcript, a perfect transcript, and a transcript with Word Error Rate (WER) =20%. We found 20% WER was the most likely critical point for transcripts to be just acceptable and useful. Then we further examined a lower WER of 10% (a lower bound for today’s state-of-the-art systems) employing the same experimental design. The results indicated that at 10% WER comprehensionperformance was significantly improved compared to the no-transcript condition. Finally, implications for further system development and design are discussed.

By: Ying Xin Pan; Dan Ning Jiang; Yong Qin; Michael Picheny

Published in: RC24920 in 2009


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.


Questions about this service can be mailed to reports@us.ibm.com .