Unbiased QCN for Scalable Server-Fabrics (extended version)

Ethernet is the predominant Layer-2 networking technology in the datacenter, and evolving into an economical alternative for high-performance computing clusters. Ethernet would traditionally drop packets in the event of congestion, but IEEE is striving to introduce lossless class services to enable the convergence of storage, cluster, and IP networks. Losslessness is a simple, well-known concept that may offer substantial benefits, but its application in datacenters is hampered by the fear of ensuing saturation-trees. In this work, we aim to accelerate the deployment of Quantized Congestion Notification (QCN), which IEEE has standardized, by making it compatible with emerging server-rack fabrics. In particular, we first eliminate the intrinsic unfairness of QCN under typical fan-in scenarios by installing the congestion points at inputs instead of at outputs as standard QCN does. We then demonstrate that QCN at input buffers cannot always discriminate between culprit and victim flows, and propose a novel QCN-compatible marking scheme, namely occupancy sampling. Finally, we also study the interactions between QCN and PAUSE. We have implemented our methods in a next-generation, server-rack fabric with 640 100G ports. Our experiments on both 10G and 100G links show that the combined approach rectifies QCN’s fairness and reduces the PAUSE period. Effectively, the proposed enhancements are a significant step forward in scaling converged datacenter networks.

By: Nikolaos Chrysos, Fredy Neeser, Rolf Clauberg, Daniel Crisan, Kenneth Valk, Claude Basso, Cyriel Minkenberg, Mitch Gusat

Published in: RZ3880 in 2014


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.


Questions about this service can be mailed to reports@us.ibm.com .