DGEMM using tensor cores, and its accurate and reproducible versions

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

研究成果: Conference contribution

9 被引用数 (Scopus)

抄録

This paper proposes a method for implementing dense matrix multiplication on FP64 (DGEMM) and FP32 (SGEMM) using Tensor Cores on NVIDIA’s graphics processing units (GPUs). Tensor Cores are special processing units that perform 4×4 matrix multiplications on FP16 inputs with FP32 precision, and return the result on FP32. The proposed method adopts the Ozaki scheme, an accurate matrix multiplication algorithm based on error-free transformation for matrix multiplication. The proposed method has three prominent advantages: first, it can be built upon the cublasGemmEx routine using Tensor Core operations; second, it can achieve higher accuracy than standard DGEMM, including the correctly-rounded result; third, it ensures bit-level reproducibility even for different numbers of cores and threads. The achievable performance of the method depends on the absolute-value range of each element of the input matrices. For example, when the matrices were initialized with random numbers over a dynamic range of 1E+9, our DGEMM-equivalent implementation achieved up to approximately 980 GFlops of FP64 operation on the Titan RTX GPU (with 130 TFlops on Tensor Cores), although cublasDgemm can achieve only 539 GFlops on FP64 floating-point units. Our results reveal the possibility of utilizing hardware with limited FP32/FP64 resources and fast low-precision processing units (such as AI-oriented processors) for general-purpose workloads.

本文言語English
ホスト出版物のタイトルHigh Performance Computing - 35th International Conference, ISC High Performance 2020, Proceedings
編集者Ponnuswamy Sadayappan, Bradford L. Chamberlain, Guido Juckeland, Hatem Ltaief
出版社Springer
ページ230-248
ページ数19
ISBN(印刷版)9783030507428
DOI
出版ステータスPublished - 2020
イベント35th International Conference on High Performance Computing, ISC High Performance 2020 - Frankfurt, Germany
継続期間: 2020 6月 222020 6月 25

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
12151 LNCS
ISSN(印刷版)0302-9743
ISSN(電子版)1611-3349

Conference

Conference35th International Conference on High Performance Computing, ISC High Performance 2020
国/地域Germany
CityFrankfurt
Period20/6/2220/6/25

ASJC Scopus subject areas

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)

フィンガープリント

「DGEMM using tensor cores, and its accurate and reproducible versions」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル