Threaded accurate matrix-matrix multiplications with sparse matrix-vector multiplications

Shuntaro Ichimura, Takahiro Katagiri, Katsuhisa Ozaki, Takeshi Ogita, Toru Nagai

研究成果: Conference contribution

1 引用 (Scopus)

抄録

Basic Linear Algebra Subprograms (BLAS) is a frequently used numerical library for linear algebra computations. However, it places little emphasis on computational accuracy, especially with respect to the accuracy assurance of the results. Although some algorithms for ensuring the computational accuracy of BLAS operations have been studied, there is a need for performance evaluation in advanced computer architectures. In this study, we parallelize high-precision matrix-matrix multiplication using thread-level parallelism. In addition, we conduct a performance evaluation from the viewpoints of execution speed and accuracy. We implement a method to convert dense matrices into sparse matrices by exploiting the nature of the target algorithm and adapting sparse-vector multiplication. Results obtained using the FX100 supercomputer system at Nagoya University indicate that (1) implementation with the ELL format achieves 1.43x speedup and (2) a maximum of 38x speedup compared to conventional implementation for dense matrix operations with dgemm.

元の言語English
ホスト出版物のタイトルProceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018
出版者Institute of Electrical and Electronics Engineers Inc.
ページ1093-1102
ページ数10
ISBN(印刷物)9781538655559
DOI
出版物ステータスPublished - 2018 8 3
外部発表Yes
イベント32nd IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018 - Vancouver, Canada
継続期間: 2018 5 212018 5 25

Other

Other32nd IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018
Canada
Vancouver
期間18/5/2118/5/25

Fingerprint

Linear algebra
Computer architecture
Supercomputers
Performance evaluation

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems and Management

これを引用

Ichimura, S., Katagiri, T., Ozaki, K., Ogita, T., & Nagai, T. (2018). Threaded accurate matrix-matrix multiplications with sparse matrix-vector multiplications. : Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018 (pp. 1093-1102). [8425535] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IPDPSW.2018.00168

Threaded accurate matrix-matrix multiplications with sparse matrix-vector multiplications. / Ichimura, Shuntaro; Katagiri, Takahiro; Ozaki, Katsuhisa; Ogita, Takeshi; Nagai, Toru.

Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018. Institute of Electrical and Electronics Engineers Inc., 2018. p. 1093-1102 8425535.

研究成果: Conference contribution

Ichimura, S, Katagiri, T, Ozaki, K, Ogita, T & Nagai, T 2018, Threaded accurate matrix-matrix multiplications with sparse matrix-vector multiplications. : Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018., 8425535, Institute of Electrical and Electronics Engineers Inc., pp. 1093-1102, 32nd IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018, Vancouver, Canada, 18/5/21. https://doi.org/10.1109/IPDPSW.2018.00168
Ichimura S, Katagiri T, Ozaki K, Ogita T, Nagai T. Threaded accurate matrix-matrix multiplications with sparse matrix-vector multiplications. : Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 1093-1102. 8425535 https://doi.org/10.1109/IPDPSW.2018.00168
Ichimura, Shuntaro ; Katagiri, Takahiro ; Ozaki, Katsuhisa ; Ogita, Takeshi ; Nagai, Toru. / Threaded accurate matrix-matrix multiplications with sparse matrix-vector multiplications. Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 1093-1102
@inproceedings{468940a2eaab4a839cbeeef718f12383,
title = "Threaded accurate matrix-matrix multiplications with sparse matrix-vector multiplications",
abstract = "Basic Linear Algebra Subprograms (BLAS) is a frequently used numerical library for linear algebra computations. However, it places little emphasis on computational accuracy, especially with respect to the accuracy assurance of the results. Although some algorithms for ensuring the computational accuracy of BLAS operations have been studied, there is a need for performance evaluation in advanced computer architectures. In this study, we parallelize high-precision matrix-matrix multiplication using thread-level parallelism. In addition, we conduct a performance evaluation from the viewpoints of execution speed and accuracy. We implement a method to convert dense matrices into sparse matrices by exploiting the nature of the target algorithm and adapting sparse-vector multiplication. Results obtained using the FX100 supercomputer system at Nagoya University indicate that (1) implementation with the ELL format achieves 1.43x speedup and (2) a maximum of 38x speedup compared to conventional implementation for dense matrix operations with dgemm.",
keywords = "Accuracy Assurance, Component, Error-free Transformation, High-precision Matrix-Matrix Multiplications, Sparse Matrix-vector Multiplications, Thread Parallelism",
author = "Shuntaro Ichimura and Takahiro Katagiri and Katsuhisa Ozaki and Takeshi Ogita and Toru Nagai",
year = "2018",
month = "8",
day = "3",
doi = "10.1109/IPDPSW.2018.00168",
language = "English",
isbn = "9781538655559",
pages = "1093--1102",
booktitle = "Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Threaded accurate matrix-matrix multiplications with sparse matrix-vector multiplications

AU - Ichimura, Shuntaro

AU - Katagiri, Takahiro

AU - Ozaki, Katsuhisa

AU - Ogita, Takeshi

AU - Nagai, Toru

PY - 2018/8/3

Y1 - 2018/8/3

N2 - Basic Linear Algebra Subprograms (BLAS) is a frequently used numerical library for linear algebra computations. However, it places little emphasis on computational accuracy, especially with respect to the accuracy assurance of the results. Although some algorithms for ensuring the computational accuracy of BLAS operations have been studied, there is a need for performance evaluation in advanced computer architectures. In this study, we parallelize high-precision matrix-matrix multiplication using thread-level parallelism. In addition, we conduct a performance evaluation from the viewpoints of execution speed and accuracy. We implement a method to convert dense matrices into sparse matrices by exploiting the nature of the target algorithm and adapting sparse-vector multiplication. Results obtained using the FX100 supercomputer system at Nagoya University indicate that (1) implementation with the ELL format achieves 1.43x speedup and (2) a maximum of 38x speedup compared to conventional implementation for dense matrix operations with dgemm.

AB - Basic Linear Algebra Subprograms (BLAS) is a frequently used numerical library for linear algebra computations. However, it places little emphasis on computational accuracy, especially with respect to the accuracy assurance of the results. Although some algorithms for ensuring the computational accuracy of BLAS operations have been studied, there is a need for performance evaluation in advanced computer architectures. In this study, we parallelize high-precision matrix-matrix multiplication using thread-level parallelism. In addition, we conduct a performance evaluation from the viewpoints of execution speed and accuracy. We implement a method to convert dense matrices into sparse matrices by exploiting the nature of the target algorithm and adapting sparse-vector multiplication. Results obtained using the FX100 supercomputer system at Nagoya University indicate that (1) implementation with the ELL format achieves 1.43x speedup and (2) a maximum of 38x speedup compared to conventional implementation for dense matrix operations with dgemm.

KW - Accuracy Assurance

KW - Component

KW - Error-free Transformation

KW - High-precision Matrix-Matrix Multiplications

KW - Sparse Matrix-vector Multiplications

KW - Thread Parallelism

UR - http://www.scopus.com/inward/record.url?scp=85052243155&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85052243155&partnerID=8YFLogxK

U2 - 10.1109/IPDPSW.2018.00168

DO - 10.1109/IPDPSW.2018.00168

M3 - Conference contribution

AN - SCOPUS:85052243155

SN - 9781538655559

SP - 1093

EP - 1102

BT - Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -