Towards Robust Features for Classifying Audio in the Cuevideo System

        The role of audio in context of multimedia applications involving video is becoming increasingly important. Many efforts in the area focus on audio data that contains some built-in semantic information structure such as broadcast news, or focus on classification of audio that contains a single type of sound such as clear speech or clear music only. In the CueVideo system, we detect and classify audio that consists of mixed audio, i.e. combinations of speech and music together with other types of background sounds. Segmentation of mixed audio has applications in detection of story boundaries in video, spoken document retrieval systems, audio retrieval systems etc. We modify and combine audio features known to be effective in distinguishing speech from music, and examine their behavior on mixed audio. Our experimental results show that we can achieve a classification accuracy of over 80% for such mixed audio. Our study also provides us with helpful insights related to analyzing mixed audio on the context of real applications.

By: S. Srinivasan, D. Petkovic, D. Ponceleon

Published in: RJ10142 in 1999

This Research Report is not available electronically. Please request a copy from the contact listed below. IBM employees should contact ITIRC for a copy.

Questions about this service can be mailed to reports@us.ibm.com .