Improved Text Overlay Detection in Videos using a Fusion-Based Classifier

In this paper, classifier fusion is adopted to demonstrate improved performance for our text overlay detections in the NIST TREC-2002 Video Retrieval Benchmark. A normalized ensemble fusion is explored to combine two text overlay detection models. The fusion incorporates normalization of confidence scores, aggregation via combiner function, and an optimize selection. The proposed fusion classifier resulted best out of 11 detectors submitted to the NIST text overlay detection benchmarking and its average precision performance is 227% of the second best detector in the benchmark.

By: Belle L. Tseng, Ching-Yung Lin, Dongqing Zhang, John R. Smith

Published in: Proceedings of the 2003 International Conference on Multimedia and Expo, Piscataway, NJ, , IEEE. , vol.3, p.473-6 in 2003

Please obtain a copy of this paper from your local library. IBM cannot distribute this paper externally.

Questions about this service can be mailed to reports@us.ibm.com .