Evaluating the Use of Data Transformation for Information Visualization

Information visualization software helps people to interactively explore and explain large and complex data sets. In real-world applications, however, raw data often need to be pre-processed (e.g., cleaning and sampling) before they can be properly visualized and comprehended by human. To make the matter worse, in a complex information analysis process, a user’s information needs are continuously evolving and the data to be visualized cannot be anticipated. In such cases, it may require a system dynamically to decide how to best prepare (transform) the raw data for visualization.

To address this unique challenge, we are developing technologies that can dynamically decide the needed data transformation in diverse user interaction situations. To justify and guide our work in this area, we have designed and conducted an empirical study that examines the impact of data transformation on user performance under various conditions. In particular, our study includes two experiments. The first experiment studies the impact of data transformation on user performance in simple analysis tasks. The second experiment examines the impact of data transformation on user performance in multi-step, complex analysis tasks. Our study results show that visualizations of properly transformed data help users to better perform their tasks than visualizations of non-transformed data do. The improvement is statistically significant for both simple and complex tasks. In addition, our analyses show that the benefits of data transformation vary under different conditions (e.g., task type and visualization type). Moreover, the choice of data transformation is dependent on context (e.g., a user’s analytic history) and thus needs to be decided dynamically.

By: Zhen Wen; Michelle X. Zhou

Published in: RC24358 in 2007


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.


Questions about this service can be mailed to reports@us.ibm.com .