Personalized Video Summary Using Visual Semantic Annotations and Automatic Speech Transcriptions

A personalized video summary is dynamically generated in our video personalization and summarization system based on user preference and usage environment. The three-tier personalization system adopts the server-middleware-client architecture in order to maintain, select, adapt, and deliver rich media content to the user. The server stores the content sources along with their corresponding MPEG-7 metadata descriptions. In this paper, the metadata includes visual semantic annotations and automatic speech transcriptions. Our personalization and summarization engine in the middleware selects the optimal set of desired video segments by matching shot annotations and sentence transcripts with user preferences. The process includes the shot-to-sentence alignment, summary segment selection, and user preference matching and propagation. As a result, the relevant visual shot and audio sentence segments are aggregated and composed into a personalized video summary.

By: Belle L. Tseng, Ching-Yung Lin

Published in: Proceedings of the 2002 IEEE Workshop on Multimedia Signal Processing. Piscataway, NJ, , IEEE. , p.5-8 in 2002

Please obtain a copy of this paper from your local library. IBM cannot distribute this paper externally.

Questions about this service can be mailed to reports@us.ibm.com .