A study of computational auditory scene analysis

In a real environment, a mixture of sounds reaches the ears. Even in such an environment, the ability of humans to recognize individual voices functions stably. For example, a person can turn his or her attention to the voice of a particular person at a cocktail party. This is called the "cocktail party effect."
On the other hand, an automatic speech recognition system does not have a function equivalent to the cocktail party effect. Therefore, it cannot perform adequately in a real environment.
Recently, Bregman has contended that the mixture of sounds reaching the ears is subjected to auditory scene analysis. He use this idea as a basis for understanding various auditory phenomena experienced by humans. However, his idea does not directly help to improve the performance of automatic speech recognition in a real environment.
In this paper, we introduce "computational auditory scene analysis (CASA)," whose aim is to enable computers to perform auditory scene analysis. The main problem in CASA is how to realize the cocktail party effect on a computer.
As an example of CASA, we discuss the research of Guy Brown at Sheffield University in England, and explain how we implemented his technique by using the MATLAB language.

By: Masaharu Sakamoto,Michio Yamada(University of Tokyo)

Published in: RT0295 in 2002

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rt0295.pdf

Questions about this service can be mailed to reports@us.ibm.com .