Designing a Highly-Scalable Operating System: The Blue Gene/L Story

Blue Gene/L is curently the world’s fastest and most scalable supercomputer. It has demonstrated esentialy linear scaling al the way to 131,072 procesors in several benchmarks and real aplications. The operating systems for the compute I/O nodes of Blue Gene/L are among components responsible for that scalability. Compute nodes are dedicated to runing aplication proceses, whereas I/O nodes are dedicated to performing system functions. The operating systems adopted for each of these nodes reflecthis separation ofunction. Compute nodes run a lightweight operating system caled the compute node kernel. I/O nodes run a port of the Linux operating system. This paper discuses the architecture and design of thisolution for Blue Gene/L in context of the hardware characteristics that led to the design decisions. It also explains and demonstrates how those decisions are instrumental in achieving the performance and scalability for which Blue Gene/L is famous.

By: José Moreira, Michael Brutman, José Castaños, Thomas Engelsiepen, Mark Giampapa, Tom Gooding, Roger Haskin, Todd Inglett, Derek Lieber, Pat McCarthy, Mike Mundy, Jeff Parker, Brian Wallenfeld

Published in: RC24037 in 2006


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.


Questions about this service can be mailed to .