Reproducible BLAS routines with tunable accuracy using ozaki scheme for many-core architectures

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

研究成果: Conference contribution

2 引用 (Scopus)

抜粋

Generally, floating-point computations comprise rounding errors; the result may be inaccurate and not identical (non-reproducible). Particularly, heterogeneous computing has many factors that affect reproducibility. The loss of accuracy and reproducibility could be a crucial issue in debugging complex codes and the reliability of computations. In this paper, we propose high-performance implementations of reproducible basic linear algebra subprograms (BLAS) routines with tunable accuracy for many-core architectures. Our approach is based on an accurate matrix-multiplication method, Ozaki scheme, which can be constructed on level-3 BLAS that performs standard floating-point operations. We demonstrate the performance of three routines: inner product (DOT), matrix-vector multiplication (GEMV), and matrix-multiplication (GEMM) on NVIDIA’s Volta GPU by comparing these with the standard routines provided by the vendor. Furthermore, we demonstrate the reproducibility between CPU and GPU and its accuracy.

元の言語English
ホスト出版物のタイトルParallel Processing and Applied Mathematics - 13th International Conference, PPAM 2019, Revised Selected Papers
編集者Roman Wyrzykowski, Konrad Karczewski, Ewa Deelman, Jack Dongarra
出版者Springer
ページ516-527
ページ数12
ISBN(印刷物)9783030432287
DOI
出版物ステータスPublished - 2020
イベント13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019 - Bialystok, Poland
継続期間: 2019 9 82019 9 11

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
12043 LNCS
ISSN(印刷物)0302-9743
ISSN(電子版)1611-3349

Conference

Conference13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019
Poland
Bialystok
期間19/9/819/9/11

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

フィンガープリント Reproducible BLAS routines with tunable accuracy using ozaki scheme for many-core architectures' の研究トピックを掘り下げます。これらはともに一意のフィンガープリントを構成します。

  • これを引用

    Mukunoki, D., Ogita, T., & Ozaki, K. (2020). Reproducible BLAS routines with tunable accuracy using ozaki scheme for many-core architectures. : R. Wyrzykowski, K. Karczewski, E. Deelman, & J. Dongarra (版), Parallel Processing and Applied Mathematics - 13th International Conference, PPAM 2019, Revised Selected Papers (pp. 516-527). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 巻数 12043 LNCS). Springer. https://doi.org/10.1007/978-3-030-43229-4_44