Publication details

Towards Peer-to-Peer Scheduling Architecture for the Czech National Grid

Authors

TÓTH Šimon RUDA Miroslav MATYSKA Luděk

Year of publication 2011
Type Article in Proceedings
Citation
Attached files
Description The Czech National Grid Infrastructure MetaCentrum has been using a central scheduler infrastructure for approximately the past 10 years. This facilitated simple administration and direct support for large jobs running across several geographical sites. The knowledge of complete state allowed the scheduler to provide high quality decision making incorporating features like fairshare. On the other hand, this central setup created a single point of failure issue and also reached its scalability limits. In this paper we describe our work towards a new distributed architecture that maintains high scheduling quality while solving most of the single server issues. Our new distributed architecture provides both local autonomy and high scheduling quality. Users can still submit jobs locally even when cross-site connectivity is lost. Individual schedulers work primarily with their local server but still maintain global state, that allows them to mimic centralised scheduling features. The architecture still supports central accounting and fairshare across the entire grid. Implementation is based on the open-source Torque batch system, which replaced the previous commercial PBSPro central server installation. Torque provides a similar codebase as it has a common ancestor with PBSPro in OpenPBS. Torque therefore provides familiar interface for both users and developers.