Learning to Annotate Video Databases

Copyright 2001 Society of Photo-Optical Instrumentation Engineers. This paper was (will be) published in and is made available as an electronic reprint [preprint] with permission of SPIE. Single print or electronic copies for personal use only are allowed. Systematic or multiple reproduction, distribution to multiple locations through an electronic listserver or other electronic means, duplication of any material in this paper for a fee or for commericial purposes, or modification of the content of the pater are all prohibited. By choosing to view or print this document, you agree to all the provisions of the copyright law protecting it.

Model-based approach to video retrieval requires ground-truth data for training the models. This leads one to consider developing video annotation tools that allow users to annotate each shot in the video sequence as well as to identify and label scenes, events, and objects by applying the labels at the shot-level. The annotation tool considered here also allows the user to associate the object-labels with an individual region in a key-frame image. However, the abundance of video data and diversity of labels make annotation a difficult and overly expensive task. To combat this problem, we formulate the task of annotation in the framework of supervised training with partially labeled data by viewing it as an exercise in active learning. In this scenario, one first trains a classifier with a small set of labeled data, and subsequently updates the classifier by selecting the most informative, or most uncertain subset of the available data-set. Propagation of labels to yet unlabeled data is automatically achieved as well.

The purpose of this paper is primarily twofold. The first is to describe a video annotation tool that has been developed for the purpose of annotating generic video sequences in the context of a recent video-TREC benchmarking exercise. The tool is semi-automatic in that it automatically propagates labels to ``similar'' shots, which requires the user to confirm or reject a propagated label. The second purpose is to show how active learning strategy can be potentially implemented in this context to further improve the performance of the annotation tool. While many versions of active learning could be thought of, we specifically report results on experiments with support vector machine classifiers with polynomial kernels.

By: Milind R. Naphade, John R. Smith, Sankar Basu, Belle L. Tseng, Ching-Yung Lin

Published in: SPIE Proceedings, volume 4676, (no ), pages 264-75 in 2001

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc22231.pdf

Questions about this service can be mailed to reports@us.ibm.com .