Fault tolerance grid scheduling with checkpoint based on Ant Colony System

Saufi Bukhari, Ku Ruhana Ku-Mahamud, Hiroaki Morino

Research output: Research - peer-reviewArticle

Abstract

Task resubmission and checkpoint are among several popular techniques used in providing fault tolerance in grid computing. However, due to the lack of side-by-side comparison, it is not certain of the best technique that would not degrade the system performance in addition to providing fault tolerance capability. This study proposed Dynamic ACSbased Fault Tolerance in grid computing using resubmission to new resource, checkpoint technique and utilization of resource execution history with the aim to reduce execution and task processing time and to increase the success rate in grid environment. The proposed algorithm is compared with other relevant algorithms to measure the performance in terms of execution time, success rate and average processing time. The results suggest that the proposed algorithm with improved task resubmission, checkpoint and extended pheromone update formula gives better performance in managing execution failure as well as resource selection during task assignment or resubmission.

LanguageEnglish
Pages363-370
Number of pages8
JournalJournal of Computer Science
Volume13
Issue number8
DOIs
StatePublished - 2017

Fingerprint

Fault tolerance
Scheduling
Grid computing
Processing

Keywords

  • Ant Colony System
  • Fault tolerance
  • Grid computing
  • Task checkpoint
  • Task resubmission

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications
  • Artificial Intelligence

Cite this

Fault tolerance grid scheduling with checkpoint based on Ant Colony System. / Bukhari, Saufi; Ku-Mahamud, Ku Ruhana; Morino, Hiroaki.

In: Journal of Computer Science, Vol. 13, No. 8, 2017, p. 363-370.

Research output: Research - peer-reviewArticle

Bukhari, Saufi ; Ku-Mahamud, Ku Ruhana ; Morino, Hiroaki. / Fault tolerance grid scheduling with checkpoint based on Ant Colony System. In: Journal of Computer Science. 2017 ; Vol. 13, No. 8. pp. 363-370
@article{09e11dc611484b50b868d540869df9f4,
title = "Fault tolerance grid scheduling with checkpoint based on Ant Colony System",
abstract = "Task resubmission and checkpoint are among several popular techniques used in providing fault tolerance in grid computing. However, due to the lack of side-by-side comparison, it is not certain of the best technique that would not degrade the system performance in addition to providing fault tolerance capability. This study proposed Dynamic ACSbased Fault Tolerance in grid computing using resubmission to new resource, checkpoint technique and utilization of resource execution history with the aim to reduce execution and task processing time and to increase the success rate in grid environment. The proposed algorithm is compared with other relevant algorithms to measure the performance in terms of execution time, success rate and average processing time. The results suggest that the proposed algorithm with improved task resubmission, checkpoint and extended pheromone update formula gives better performance in managing execution failure as well as resource selection during task assignment or resubmission.",
keywords = "Ant Colony System, Fault tolerance, Grid computing, Task checkpoint, Task resubmission",
author = "Saufi Bukhari and Ku-Mahamud, {Ku Ruhana} and Hiroaki Morino",
year = "2017",
doi = "10.3844/jcssp.2017.363.370",
volume = "13",
pages = "363--370",
journal = "Journal of Computer Science",
issn = "1549-3636",
publisher = "Science Publications",
number = "8",

}

TY - JOUR

T1 - Fault tolerance grid scheduling with checkpoint based on Ant Colony System

AU - Bukhari,Saufi

AU - Ku-Mahamud,Ku Ruhana

AU - Morino,Hiroaki

PY - 2017

Y1 - 2017

N2 - Task resubmission and checkpoint are among several popular techniques used in providing fault tolerance in grid computing. However, due to the lack of side-by-side comparison, it is not certain of the best technique that would not degrade the system performance in addition to providing fault tolerance capability. This study proposed Dynamic ACSbased Fault Tolerance in grid computing using resubmission to new resource, checkpoint technique and utilization of resource execution history with the aim to reduce execution and task processing time and to increase the success rate in grid environment. The proposed algorithm is compared with other relevant algorithms to measure the performance in terms of execution time, success rate and average processing time. The results suggest that the proposed algorithm with improved task resubmission, checkpoint and extended pheromone update formula gives better performance in managing execution failure as well as resource selection during task assignment or resubmission.

AB - Task resubmission and checkpoint are among several popular techniques used in providing fault tolerance in grid computing. However, due to the lack of side-by-side comparison, it is not certain of the best technique that would not degrade the system performance in addition to providing fault tolerance capability. This study proposed Dynamic ACSbased Fault Tolerance in grid computing using resubmission to new resource, checkpoint technique and utilization of resource execution history with the aim to reduce execution and task processing time and to increase the success rate in grid environment. The proposed algorithm is compared with other relevant algorithms to measure the performance in terms of execution time, success rate and average processing time. The results suggest that the proposed algorithm with improved task resubmission, checkpoint and extended pheromone update formula gives better performance in managing execution failure as well as resource selection during task assignment or resubmission.

KW - Ant Colony System

KW - Fault tolerance

KW - Grid computing

KW - Task checkpoint

KW - Task resubmission

UR - http://www.scopus.com/inward/record.url?scp=85029771397&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85029771397&partnerID=8YFLogxK

U2 - 10.3844/jcssp.2017.363.370

DO - 10.3844/jcssp.2017.363.370

M3 - Article

VL - 13

SP - 363

EP - 370

JO - Journal of Computer Science

T2 - Journal of Computer Science

JF - Journal of Computer Science

SN - 1549-3636

IS - 8

ER -