Direct density ratio estimation for large-scale covariate shift adaptation

Covariate shift is a situation of supervised learning where
training and test inputs follow different distributions while
the functional relation remains unchanged. A common
approach to compensating for the bias caused by covariate
shift is to reweight the training samples according to the
importance, which is the ratio of test and training densities.
In this paper, we address the problem of estimating the
importance from samples and propose a novel method that
allows us to directly estimate the importance without going
through a hard task of density estimation. An advantage of
the proposed method is that the computation time is nearly
independent of the number of test input samples, which
is highly beneficial in recent applications with abundant
unlabeled samples. We demonstrate through experiments
that the proposed method is computationally more efficient
than existing approaches with competitive accuracy.

By: Yuta Tsuboi, Hisashi Kashima, Shohei Hido, Steffen Bickel, and Masashi Sugiyama

Published in: Proceedings of SIAM Data Mining in 2008

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

RT0756.pdf

Questions about this service can be mailed to reports@us.ibm.com .