Flow and Congestion Control for Datacenter Networks

The limits of power dissipation and Moore's law are leading toward increasing parallelism and a shift of focus from CPUs to interconnection networks. This trend is also reflected in the rise of blade-based datacenters, which cluster server and storage units packaged as blades, with several networks. We begin with the trends and requirements of datacenter interconnection networks. Next, we show that lossless link-level flow control is a necessary feature of such networks, required for correctness and performance. However, such flow control schemes have a side-effect: saturation tree congestion, potentially causing catastrophic performance collapse. We argue that the ongoing trends toward increased efficiency, consolidation, and virtualization will escalate the likelihood of congestive collapse. This amplifies the need for congestion management with prevention and recovery mechanisms. Solutions established in best-effort networks (TCP/IP and ATM) are not directly suitable, mainly because they assume a lossy link layer. We advocate the need for research in flow and congestion control in lossless interconnects to meet the challenges of ubiquitous parallelism.

By: M. Gusat, C. Minkenberg, G.J. Paljak

Published in: RZ3742 in 2009


