Accurate Matrix Multiplication on Binary128 Format Accelerated by Ozaki Scheme

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

研究成果: Conference contribution

抄録

Although IEEE 754-2008 binary128 (with a 15-bit exponent and 113-bit significand, i.e., quadruple-precision) is not currently implemented on x86 in hardware, software emulation is available on some compilers. However, the performance is significantly slower compared to the binary64 operation, which is supported natively in hardware. This study proposes a fast implementation of matrix multiplication on matrices stored in the binary128 format on x86 CPUs. The proposed implementation utilizes the Ozaki scheme, which is an accurate matrix multiplication algorithm proposed by Ozaki et al. in 2012. This scheme enables one to perform most computations using the binary64 matrix multiplication (the DGEMM routine in Basic Linear Algebra Subprograms (BLAS)); it can exploit the high-performance of highly-optimized vendor BLAS. Although the achievable performance depends on the input matrices (the inner-product dimension, the absolute range, and the significand bit length), the proposed implementation can achieve better performance and accuracy compared to naive matrix multiplication performed using the GCC's binary128 emulation in many cases. In addition, we discuss GPU acceleration, performance on reduced precision inputs, an implementation based on binary32 matrix multiplication (SGEMM), application to memory-intensive operations, and the possibility of a distributed parallel implementation.

本文言語English
ホスト出版物のタイトル50th International Conference on Parallel Processing, ICPP 2021 - Main Conference Proceedings
出版社Association for Computing Machinery
ISBN(電子版)9781450390682
DOI
出版ステータスPublished - 2021 8月 9
イベント50th International Conference on Parallel Processing, ICPP 2021 - Virtual, Online, United States
継続期間: 2021 8月 92021 8月 12

出版物シリーズ

名前ACM International Conference Proceeding Series

Conference

Conference50th International Conference on Parallel Processing, ICPP 2021
国/地域United States
CityVirtual, Online
Period21/8/921/8/12

ASJC Scopus subject areas

  • ソフトウェア
  • 人間とコンピュータの相互作用
  • コンピュータ ビジョンおよびパターン認識
  • コンピュータ ネットワークおよび通信

フィンガープリント

「Accurate Matrix Multiplication on Binary128 Format Accelerated by Ozaki Scheme」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル