Bayesian Regression Selecting Valuable Subset from Mixed Bag Training Data

This paper addresses a problem in which we learn a regression model from sets of training data. Each of the sets has an only single label, and only one of the training data in the set reflects the label.
This is particularly the case when the label is attached to a group of data, such as time-series data. The label is not attached to the point of the sequence but rather attached to particular time window of the sequence. As such, a small part of the time window likely reflects the label, whereas the other larger part of the time window likely does not reflect it.
We design an algorithm for estimating which of the training data in each of the sets corresponds to the label, as well as for training the regression model on the basis of Bayesian modeling and posterior inference with variational Bayes.
Our experimental results show that our approach perform better than baseline methods on an artificial dataset and on a real-world dataset.

By: Takayuki Katsuki, Masato Inoue

Published in: RT0976 in 2016


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.


Questions about this service can be mailed to .