Dynamic ACO-based fault tolerance in grid computing

Saufi Bukhari, Ku Ruhana Ku-Mahamud, Hiroaki Morino

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Scheduling jobs in distributed conditions of grid computing is nearly impossible to have a completely fault-free system. It is important to integrate fault tolerance capability in the system so that the system can continue to run even in the presence of failure in addition to improving the scheduling process as well as reducing the possibility of faults. Typically, load balancing is not considered in the presence of failure and this may lead to an inefficient scheduling process despite having a good fault tolerance strategy. This paper presents an ant-based fault tolerance algorithm that used checkpoint and resubmission techniques with consideration of execution history in the pheromone updating process to enhance fault tolerance capability. Experimental results showed that the proposed algorithm has better performance as compared to other relevant algorithms in terms of execution time, success rate, and average turnaround time per job.

Original languageEnglish
Pages (from-to)117-124
Number of pages8
JournalInternational Journal of Grid and Distributed Computing
Volume10
Issue number12
DOIs
Publication statusPublished - 2017 Jan 1

Fingerprint

Grid computing
Fault tolerance
Scheduling
Turnaround time
Resource allocation

Keywords

  • Ant colony optimization
  • Ant colony system
  • Fault tolerance
  • Grid computing
  • Job scheduling

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Dynamic ACO-based fault tolerance in grid computing. / Bukhari, Saufi; Ku-Mahamud, Ku Ruhana; Morino, Hiroaki.

In: International Journal of Grid and Distributed Computing, Vol. 10, No. 12, 01.01.2017, p. 117-124.

Research output: Contribution to journalArticle

Bukhari, Saufi ; Ku-Mahamud, Ku Ruhana ; Morino, Hiroaki. / Dynamic ACO-based fault tolerance in grid computing. In: International Journal of Grid and Distributed Computing. 2017 ; Vol. 10, No. 12. pp. 117-124.
@article{e56c9ddba83d41ea885076e9084bf1b0,
title = "Dynamic ACO-based fault tolerance in grid computing",
abstract = "Scheduling jobs in distributed conditions of grid computing is nearly impossible to have a completely fault-free system. It is important to integrate fault tolerance capability in the system so that the system can continue to run even in the presence of failure in addition to improving the scheduling process as well as reducing the possibility of faults. Typically, load balancing is not considered in the presence of failure and this may lead to an inefficient scheduling process despite having a good fault tolerance strategy. This paper presents an ant-based fault tolerance algorithm that used checkpoint and resubmission techniques with consideration of execution history in the pheromone updating process to enhance fault tolerance capability. Experimental results showed that the proposed algorithm has better performance as compared to other relevant algorithms in terms of execution time, success rate, and average turnaround time per job.",
keywords = "Ant colony optimization, Ant colony system, Fault tolerance, Grid computing, Job scheduling",
author = "Saufi Bukhari and Ku-Mahamud, {Ku Ruhana} and Hiroaki Morino",
year = "2017",
month = "1",
day = "1",
doi = "10.14257/ijgdc.2017.10.12.11",
language = "English",
volume = "10",
pages = "117--124",
journal = "International Journal of Grid and Distributed Computing",
issn = "2005-4262",
publisher = "Science and Engineering Research Support Society",
number = "12",

}

TY - JOUR

T1 - Dynamic ACO-based fault tolerance in grid computing

AU - Bukhari, Saufi

AU - Ku-Mahamud, Ku Ruhana

AU - Morino, Hiroaki

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Scheduling jobs in distributed conditions of grid computing is nearly impossible to have a completely fault-free system. It is important to integrate fault tolerance capability in the system so that the system can continue to run even in the presence of failure in addition to improving the scheduling process as well as reducing the possibility of faults. Typically, load balancing is not considered in the presence of failure and this may lead to an inefficient scheduling process despite having a good fault tolerance strategy. This paper presents an ant-based fault tolerance algorithm that used checkpoint and resubmission techniques with consideration of execution history in the pheromone updating process to enhance fault tolerance capability. Experimental results showed that the proposed algorithm has better performance as compared to other relevant algorithms in terms of execution time, success rate, and average turnaround time per job.

AB - Scheduling jobs in distributed conditions of grid computing is nearly impossible to have a completely fault-free system. It is important to integrate fault tolerance capability in the system so that the system can continue to run even in the presence of failure in addition to improving the scheduling process as well as reducing the possibility of faults. Typically, load balancing is not considered in the presence of failure and this may lead to an inefficient scheduling process despite having a good fault tolerance strategy. This paper presents an ant-based fault tolerance algorithm that used checkpoint and resubmission techniques with consideration of execution history in the pheromone updating process to enhance fault tolerance capability. Experimental results showed that the proposed algorithm has better performance as compared to other relevant algorithms in terms of execution time, success rate, and average turnaround time per job.

KW - Ant colony optimization

KW - Ant colony system

KW - Fault tolerance

KW - Grid computing

KW - Job scheduling

UR - http://www.scopus.com/inward/record.url?scp=85044931753&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85044931753&partnerID=8YFLogxK

U2 - 10.14257/ijgdc.2017.10.12.11

DO - 10.14257/ijgdc.2017.10.12.11

M3 - Article

VL - 10

SP - 117

EP - 124

JO - International Journal of Grid and Distributed Computing

JF - International Journal of Grid and Distributed Computing

SN - 2005-4262

IS - 12

ER -