Data Layouts for Object-oriented Programs

Object-oriented programs rely heavily on objects and pointers, making them vulnerable to slowdowns from cache and TLB misses. The cache and TLB behavior depends on the data layout of objects in memory. There are many possible data layouts with different impacts on performance, but it is not known which perform better. This paper presents a novel framework for evaluating data layouts. The framework both makes implementing many layouts easy, and enables performance measurements of real programs using a product Java virtual machine on stock hardware. This is achieved by sorting objects during copying garbage collection; outside of garbage collection, program performance is solely determined by the data layout that the sort key implements. This paper surveys and evaluates 10 common data layouts with 32 realistic benchmark programs running on 3 different hardware configurations. The results confirm the importance of data layouts for program performance, and show that almost all layouts yield the best performance for some programs and the worst performance for others.

By: Martin Hirzel

Published in: RC24218 in 2007

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc24218.pdf

Questions about this service can be mailed to reports@us.ibm.com .