Blue Eyes: Scalable and Reliable System Management for Cloud Computing

With the advent of cloud computing, massive and automated system management has become more important for successful and economical operation of computing resources. However, traditional monolithic system management solutions are designed to scale to only hundreds or thousands of systems at most. In this paper, we present Blue Eyes, a new system management solution with a multi-server scale-out architecture to handle hundreds of thousands of systems. Blue Eyes enables highly scalable and reliable system management by running many management servers in a distributed manner to collaboratively work on management tasks. In particular, we structure the management servers into a hierarchical tree to achieve scalability and management information is replicated into secondary servers to provide reliability and high availability. In addition, Blue Eyes is designed to extend the existing single server implementation without significantly restructuring the code base. Several experimental results with the Blue Eyes prototype have demonstrated that our multi-server system can reliably handle typical management tasks for a large scale of endpoints with dynamic load-balancing across the servers, near linear performance gain with server additions, and an acceptable network overhead.

By: Sukhyun Song; Kyung Dong Ryu; Dilma Da Silva

Published in: RC24721 in 2009


