A Generalized Kullback-Leibler Distance and its Minimization

In this note, we discuss a generalization of the concept of maximum entropy
to nonnegative functions that are not necessarily probability distributions.
We first define an extension of training
data likelihood according to a probability distribution to a
generalized likelihood according to a nonnegative
function. We then define an exponential family of nonnegative
functions that has three components: a fixed prior nonnegative
function, a set of features, and their corresponding real weights. We
consider a well-posed optimization problem on the family. We show
that the objective function is globally convex, and that it
corresponds to the dual of a version of Kullback-Leibler distance
minimization (or entropy maximization) suitably generalized to
nonnegative functions. We then provide a closed-form solution to a
one-dimensional optimization problem with a single scalar binary
feature function. We propose a version of the improved iterative scaling
algorithm to solve a general multi-dimensional optimization problem
and prove its convergence to the unique solution.

By: Kishore Papineni

Published in: RC21815 in 2000

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

RC21815.pdf

Questions about this service can be mailed to reports@us.ibm.com .