This report documents the results of work done over a 6 year period under the FAST-OS programs. The first e↵ort was called Right-Weight Kernels, (RWK) and was concerned with improving measurements of OS noise so it could be treated quantitatively; and evaluating the use of two operating systems, Linux and Plan 9, on HPC systems and determining how these operating systems needed to be extended or changed for HPC, while still retaining their general purpose nature.

The second program, HARE, explored the creation of alternative runtime models, building on RWK. All of the HARE work was done on Plan 9. The HARE reseachers were mindful of the very good Linux and LWK work being done at other labs and saw no need to recreate it.

The organizations included LANL (RWK) and Sandia (RWK, HARE), as the PI moved to Sandia; IBM; Bell Labs; and Vita Nuova, as a subcontractor to Bell Labs. In any given year, the funding was su↵cient to cover a PI from each organization part time.

Even given this limited funding, the two efforts had outsized impact:

    • Helped Cray decide to use Linux, instead of a custom kernel, and provided the tools needed to make Linux perform well
    • Created a successor operating system to Plan 9, NIX, which has been taken in by Bell Labs for further development
    • Created a standard system measurement tool, Fixed Time Quantum or FTQ, which is widely used for measuring operating systems impact on applications
    • Spurred the use of the 9p protocol in several organizations, including IBM
    • Built software in use at many companies, including IBM, Cray, and Google
    • Spurred the creation of alternative runtimes for use on HPC systems
    • Demonstrated that, with proper modifications, a general purpose operating systems can provide communications up to 3 times as effective as user-level libraries

We describe details of these impacts in the following sections. The rest of this report is organized as follows: first, we describe commercial impact; next, we describe the FTQ benchmark and its impact in more detail; operating systems and runtime research follows; we discuss infrastructure software; and close with a description of the new NIX operating system, future work, and conclusions.

By: Eric Van Hensbergen; Ron Minnich; Jim Mckie; Charles Forsyth

Published in: RC25241 in 2011

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc25241.pdf

Questions about this service can be mailed to reports@us.ibm.com .