Performance, area, and power evaluations of ultrafine-grained run-time power-gating routers for CMPs

Hiroki Matsutani, Michihiro Koibuchi, Daisuke Ikebuchi, Kimiyoshi Usami, Hiroshi Nakamura, Hideharu Amano

Research output: Contribution to journalArticle

27 Citations (Scopus)

Abstract

This paper proposes the ultrafine-grained run-time power gating of on-chip routers, in which the power supply to each router component (e.g., virtual-channel buffer, virtual-channel multiplexer, and crossbar multiplexer and output latch) can be individually controlled based on the applied workload. Since only the router components that are transferring a packet are activated, the leakage power of the on-chip network can be reduced to a near-optimal level. However, such techniques inherently increase the communication latency and degrade the application performance, since a certain amount of wakeup latency is required to activate the sleeping components. To mitigate this wakeup latency, an early wakeup method that can preliminarily detect the next packet arrival and activate the corresponding components is essential. We designed and implemented an ultrafine-grained power-gating router using a commercial 65 nm process. We propose four early wakeup methods and combine them with the power-gating router. The proposed router with the early wakeup methods is evaluated in terms of its application performance, area overhead, and leakage power reduction taking into account the on/off energy overhead. The simulation results showed that it reduces the leakage power by 54.4-59.9% on average even when the application programs are fully running, at the expense of 4.6% of the area and 0.7-3.7% of the performance overheads when we assume a 1 GHz operation.

Original languageEnglish
Article number5737865
Pages (from-to)520-533
Number of pages14
JournalIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Volume30
Issue number4
DOIs
Publication statusPublished - 2011 Apr

Fingerprint

Routers
Application programs
Communication

Keywords

  • Low power
  • on-chip networks
  • power gating
  • router architecture

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Graphics and Computer-Aided Design
  • Software

Cite this

Performance, area, and power evaluations of ultrafine-grained run-time power-gating routers for CMPs. / Matsutani, Hiroki; Koibuchi, Michihiro; Ikebuchi, Daisuke; Usami, Kimiyoshi; Nakamura, Hiroshi; Amano, Hideharu.

In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 30, No. 4, 5737865, 04.2011, p. 520-533.

Research output: Contribution to journalArticle

Matsutani, Hiroki ; Koibuchi, Michihiro ; Ikebuchi, Daisuke ; Usami, Kimiyoshi ; Nakamura, Hiroshi ; Amano, Hideharu. / Performance, area, and power evaluations of ultrafine-grained run-time power-gating routers for CMPs. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 2011 ; Vol. 30, No. 4. pp. 520-533.
@article{74ca215bf494491f9cd95e5df1f8ee87,
title = "Performance, area, and power evaluations of ultrafine-grained run-time power-gating routers for CMPs",
abstract = "This paper proposes the ultrafine-grained run-time power gating of on-chip routers, in which the power supply to each router component (e.g., virtual-channel buffer, virtual-channel multiplexer, and crossbar multiplexer and output latch) can be individually controlled based on the applied workload. Since only the router components that are transferring a packet are activated, the leakage power of the on-chip network can be reduced to a near-optimal level. However, such techniques inherently increase the communication latency and degrade the application performance, since a certain amount of wakeup latency is required to activate the sleeping components. To mitigate this wakeup latency, an early wakeup method that can preliminarily detect the next packet arrival and activate the corresponding components is essential. We designed and implemented an ultrafine-grained power-gating router using a commercial 65 nm process. We propose four early wakeup methods and combine them with the power-gating router. The proposed router with the early wakeup methods is evaluated in terms of its application performance, area overhead, and leakage power reduction taking into account the on/off energy overhead. The simulation results showed that it reduces the leakage power by 54.4-59.9{\%} on average even when the application programs are fully running, at the expense of 4.6{\%} of the area and 0.7-3.7{\%} of the performance overheads when we assume a 1 GHz operation.",
keywords = "Low power, on-chip networks, power gating, router architecture",
author = "Hiroki Matsutani and Michihiro Koibuchi and Daisuke Ikebuchi and Kimiyoshi Usami and Hiroshi Nakamura and Hideharu Amano",
year = "2011",
month = "4",
doi = "10.1109/TCAD.2011.2110470",
language = "English",
volume = "30",
pages = "520--533",
journal = "IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems",
issn = "0278-0070",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "4",

}

TY - JOUR

T1 - Performance, area, and power evaluations of ultrafine-grained run-time power-gating routers for CMPs

AU - Matsutani, Hiroki

AU - Koibuchi, Michihiro

AU - Ikebuchi, Daisuke

AU - Usami, Kimiyoshi

AU - Nakamura, Hiroshi

AU - Amano, Hideharu

PY - 2011/4

Y1 - 2011/4

N2 - This paper proposes the ultrafine-grained run-time power gating of on-chip routers, in which the power supply to each router component (e.g., virtual-channel buffer, virtual-channel multiplexer, and crossbar multiplexer and output latch) can be individually controlled based on the applied workload. Since only the router components that are transferring a packet are activated, the leakage power of the on-chip network can be reduced to a near-optimal level. However, such techniques inherently increase the communication latency and degrade the application performance, since a certain amount of wakeup latency is required to activate the sleeping components. To mitigate this wakeup latency, an early wakeup method that can preliminarily detect the next packet arrival and activate the corresponding components is essential. We designed and implemented an ultrafine-grained power-gating router using a commercial 65 nm process. We propose four early wakeup methods and combine them with the power-gating router. The proposed router with the early wakeup methods is evaluated in terms of its application performance, area overhead, and leakage power reduction taking into account the on/off energy overhead. The simulation results showed that it reduces the leakage power by 54.4-59.9% on average even when the application programs are fully running, at the expense of 4.6% of the area and 0.7-3.7% of the performance overheads when we assume a 1 GHz operation.

AB - This paper proposes the ultrafine-grained run-time power gating of on-chip routers, in which the power supply to each router component (e.g., virtual-channel buffer, virtual-channel multiplexer, and crossbar multiplexer and output latch) can be individually controlled based on the applied workload. Since only the router components that are transferring a packet are activated, the leakage power of the on-chip network can be reduced to a near-optimal level. However, such techniques inherently increase the communication latency and degrade the application performance, since a certain amount of wakeup latency is required to activate the sleeping components. To mitigate this wakeup latency, an early wakeup method that can preliminarily detect the next packet arrival and activate the corresponding components is essential. We designed and implemented an ultrafine-grained power-gating router using a commercial 65 nm process. We propose four early wakeup methods and combine them with the power-gating router. The proposed router with the early wakeup methods is evaluated in terms of its application performance, area overhead, and leakage power reduction taking into account the on/off energy overhead. The simulation results showed that it reduces the leakage power by 54.4-59.9% on average even when the application programs are fully running, at the expense of 4.6% of the area and 0.7-3.7% of the performance overheads when we assume a 1 GHz operation.

KW - Low power

KW - on-chip networks

KW - power gating

KW - router architecture

UR - http://www.scopus.com/inward/record.url?scp=79953071090&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79953071090&partnerID=8YFLogxK

U2 - 10.1109/TCAD.2011.2110470

DO - 10.1109/TCAD.2011.2110470

M3 - Article

VL - 30

SP - 520

EP - 533

JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

SN - 0278-0070

IS - 4

M1 - 5737865

ER -