How Productivity Improves in Hands-Free Continuous Dictation Tasks: Lessons Learned from a Longitudinal Study

Speech recognition (SR) technology continues to improve, but users still experience significant difficulty using the software to create and edit documents. The reported composition speed using speech software is only between 8 to 15 words per minute (Karat, et. al., 1999; Sears, et. al. 2001), much lower than people’s normal speaking speed of 125 to 150 words per minute (Hobbs, 2001). What causes the huge gap between natural speaking and composing using speech recognition? Is it possible to narrow the gap and make speech recognition more promising to users? In this paper we discuss users’ learning processes and the difficulties they experience as related to continuous dictation tasks using state of the art Automatic Speech Recognition (ASR) software. Detailed data was collected for the first time on various aspects of the three activities involved in document composition tasks: dictation, navigation, and correction. The results indicate that navigation and error correction accounted for a big chunk of the dictation task during the early stages of interaction. As users gained more experience, they became more efficient at dictation, navigation and error correction. However, the major improvements in productivity were due to dictation quality and the usage of navigation commands. These results provide insights regarding the factors that cause the gap between user expectation with speech recognition software and the reality of use, and how those factors changed with experience. Specific advice is given to researchers as to the most critical issues that must be addressed.

By: Jinjuan Feng, Clare-Marie Karat, Andrew Sears

Published in: Interacting with Computers, volume 17, (no 3), pages 265-89 in 2005

Please obtain a copy of this paper from your local library. IBM cannot distribute this paper externally.

Questions about this service can be mailed to reports@us.ibm.com .