Scheduling Algorithms for Distributed Web Servers

A distributed Web site can provide the scalability necessary to keep up with growing client demand at popular sites. Load balancing of these distributed systems (consisting of multiple Web servers for data retrieval, and a Domain Name Server (DNS) for address resolution) opens interesting new problems. In this paper, we investigate the effects of using a more active DNS which, as an atypical centralized scheduler, applies some strategy in routing the requests to the best Web server. Unlike traditional parallel/distributed systems in which a centralized scheduler has full control of the system, the DNS controls only a small fraction of the requests reaching the Web site. This peculiarity makes it very difficult to achieve acceptable load balancing and avoid overloading situations among the multiple Web servers. This paper adaps traditional scheduling algorithms to the DNS, proposes new policies, and examines their impact under several scenarios. Extensive simulation results show the advantage of strategies that take into account the source of the requests and the state of the servers. An initially unexpected result is that the best performance is achieved by algorithms that use very limited state information (that is, whether a server is overloaded or not). Conversely, detailed information such as queue lengths, present and past utilizations, do not appear useful, and can often lead to degraded performance.

By: Michele Colajanni, Philip S. Yu and Daniel M. Dias

Published in: Proceedings Distributed Computing Systems. Los Alamitos, CA, IEEE Computer Society Press, 1997. p. 169-76, IEEE in 1997

Please obtain a copy of this paper from your local library. IBM cannot distribute this paper externally.

Questions about this service can be mailed to reports@us.ibm.com .