Distributed scheduling and data sharing in late-binding overlays

Downloads

Downloads per month over past year

42843

Impacto

Downloads

Downloads per month over past year



Huedo Cuesta, Eduardo and Delgado Peris, Antonio and Hernández, José M. (2014) Distributed scheduling and data sharing in late-binding overlays. Proceedings of the 2014 International Conference on High Performance Computing and Simulation (690367). pp. 129-136.

[thumbnail of Distributed scheduling and data shaving-preprint.pdf]
Preview
PDF
390kB


Abstract

Pull-based late-binding overlays are used in some of today’s largest computational grids. Job agents are submitted to resources with the duty of retrieving real workload from a central queue at runtime. This helps overcome the problems of these very complex environments, namely, heterogeneity, imprecise status information and relatively high failure rates. In addition, the late job assignment allows dynamic adaptation to changes in the grid conditions or user priorities. However, as the scale grows, the central assignment queue may become a bottleneck for the whole system. This article presents a distributed scheduling architecture for late-binding overlays, which addresses these scalability issues. Our system lets execution nodes build a distributed hash table and delegates job matching and assignment to them. This reduces the load on the central server and makes the system much more scalable and robust. Moreover, scalability makes fine-grained scheduling possible, and enables new functionalities like the implementation of a distributed data cache on the execution nodes, which helps alleviate the commonly congested grid storage services.


Item Type:Article
Additional Information:

© © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Uncontrolled Keywords:Grid and Cluster Computing, Scalable Computing, Peer-to-Peer Architectures and Networks, Reliable Parallel and Distributed Algorithms
Subjects:Sciences > Computer science > Computer programming
Sciences > Computer science > Computer networks
ID Code:42843
Deposited On:22 May 2017 11:30
Last Modified:22 May 2017 14:46

Origin of downloads

Repository Staff Only: item control page