Reproducible BLAS routines with tunable accuracy using ozaki scheme for many-core architectures

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Generally, floating-point computations comprise rounding errors; the result may be inaccurate and not identical (non-reproducible). Particularly, heterogeneous computing has many factors that affect reproducibility. The loss of accuracy and reproducibility could be a crucial issue in debugging complex codes and the reliability of computations. In this paper, we propose high-performance implementations of reproducible basic linear algebra subprograms (BLAS) routines with tunable accuracy for many-core architectures. Our approach is based on an accurate matrix-multiplication method, Ozaki scheme, which can be constructed on level-3 BLAS that performs standard floating-point operations. We demonstrate the performance of three routines: inner product (DOT), matrix-vector multiplication (GEMV), and matrix-multiplication (GEMM) on NVIDIA’s Volta GPU by comparing these with the standard routines provided by the vendor. Furthermore, we demonstrate the reproducibility between CPU and GPU and its accuracy.

Original languageEnglish
Title of host publicationParallel Processing and Applied Mathematics - 13th International Conference, PPAM 2019, Revised Selected Papers
EditorsRoman Wyrzykowski, Konrad Karczewski, Ewa Deelman, Jack Dongarra
PublisherSpringer
Pages516-527
Number of pages12
ISBN (Print)9783030432287
DOIs
Publication statusPublished - 2020
Event13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019 - Bialystok, Poland
Duration: 2019 Sep 82019 Sep 11

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12043 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019
CountryPoland
CityBialystok
Period19/9/819/9/11

Keywords

  • Accurate
  • BLAS
  • Reproducible

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Reproducible BLAS routines with tunable accuracy using ozaki scheme for many-core architectures'. Together they form a unique fingerprint.

  • Cite this

    Mukunoki, D., Ogita, T., & Ozaki, K. (2020). Reproducible BLAS routines with tunable accuracy using ozaki scheme for many-core architectures. In R. Wyrzykowski, K. Karczewski, E. Deelman, & J. Dongarra (Eds.), Parallel Processing and Applied Mathematics - 13th International Conference, PPAM 2019, Revised Selected Papers (pp. 516-527). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12043 LNCS). Springer. https://doi.org/10.1007/978-3-030-43229-4_44