Survay on Job Scheduling, Load Balancing and Fault Tolerance Techniques for Computational Grids

Jasma Balasangameshwara

Survay on Job Scheduling, Load Balancing and Fault Tolerance Techniques for Computational Grids

Abstract

Jasma Balasangameshwara

Computational grid is a network of loosely coupled, heterogeneous and geographically-dispersed computers acting together to perform a large compute-intensive job. In this article, we focus on the existing approaches to grid scheduling, load balancing and fault-tolerance problems. Although grid scheduling, load balancing and fault tolerance are active research areas in grid computing, these areas have largely been and continue to be developed independent of one another each focusing on different aspects of computing. Hence, in this survey, we hope to show that robust applications that can provide efficient results can be designed by collectively considering these areas. To this end, we first provide an introduction to the motivation, grid scheduling, load balancing and fault tolerance concepts of grid computing and discuss the works that have provided significant contributions to each of these areas since its inception until 2013. We discuss their advantages, disadvantages and analyze their suitability for usage in a dynamic grid environment. We conclude that, while important advancements have been made in each of these areas individually, high performance approaches that cumulatively consider these areas still remain to be explored. We also discuss the research work that is missing and what we believe the community should be considering. To the best of our knowledge, no such survey has been conducted in the literature up to now.

PDF

Share this article