A Quantitative Analysis of Space Waste from Java Strings and its Elimination at Garbage Collection Time

This paper describes a novel approach to reduce the memory consumption of Java programs, by reducing the "string memory waste" in the runtime. In recent Java applications, string data occupies a large amount of the heap area. For example, more than 30% of the live heap area is used for string data when WebSphere Application Server with Trade6 is running. By investigating the string data in real Java applications, we found two types of memory waste in typical string implementations in Java. First, there are many String objects which have the same values. Second, there are many unused areas in the char arrays used to hold the string values. This string memory waste exists as or in live objects, so it cannot not be eliminated by existing garbage collection techniques, which only remove dead objects. Quantitative analysis of Java heap revealed that such waste occupied up to 17% of the live heap area even in real Java applications.

To remove the string memory waste, we propose a new ``string garbage collection'' (StringGC) technique for Java. The StringGC works with a usual garbage collector in a JVM, unifying same-value String objects and removing the unused areas in char arrays. In an IBM production JVM, we implemented a StringGC prototype named ``UNITE'', where same-value strings are unified when they are tenured by a generational GC. This prototype was able to eliminate more than 90% of the string memory waste, and the live heap size of real Java applications was reduced by up to 15% without noticeable performance degradation.

By: Kiyokuni Kawachiya, Kazunori Ogata, and Tamiya Onodera

Published in: in 2007


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.


Questions about this service can be mailed to reports@us.ibm.com .