TSM2: Optimizing tall-and-skinny matrix-matrix multiplication on GPUs

Jieyang Chen, Nan Xiong, Xin Liang, Dingwen Tao, Sihuan Li, Kaiming Ouyang, Kai Zhao, Nathan Debardeleben, Qiang Guan, Zizhong Chen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

29 Scopus citations

Abstract

Linear algebra operations have been widely used in big data analytics and scientific computations. Many works have been done on optimizing linear algebra operations on GPUs with regular-shaped input. However, few works are focusing on fully utilizing GPU resources when the input is not regular-shaped. Current optimizations lack of considering fully utilizing the memory bandwidth and computing power, therefore they could only achieve sub-optimal performance. In this paper, we propose a performant tall-and-skinny matrix-matrix multiplication algorithm on GPUs - TSM2. It focuses on optimizing linear algebra operation with none regular-shaped input. We implement the proposed algorithm and test on three different Nvidia GPU micro-architectures: Kepler, Maxwell, and Pascal. Experiments show that our TSM2 speedups the computation by 1.1x - 3x, improves memory bandwidth utilization by 8% - 47.6%, and improves computing power utilization by 7% - 37.3% comparing to the current state-of-the-art works. We replace the original matrix operations in K-means and Algorithm-Bases Fault Tolerance (ABFT) with TSM2 and achieve up to 1.89x and 1.90x speed up.

Original languageEnglish
Title of host publicationICS 2019 - International Conference on Supercomputing
Pages106-116
Number of pages11
ISBN (Electronic)9781450360791
DOIs
StatePublished - Jun 26 2019
Event33rd ACM International Conference on Supercomputing, ICS 2019, held in conjunction with the Federated Computing Research Conference, FCRC 2019 - Phoenix, United States
Duration: Jun 26 2019 → …

Publication series

NameProceedings of the International Conference on Supercomputing

Conference

Conference33rd ACM International Conference on Supercomputing, ICS 2019, held in conjunction with the Federated Computing Research Conference, FCRC 2019
Country/TerritoryUnited States
CityPhoenix
Period6/26/19 → …

Bibliographical note

Publisher Copyright:
© 2019 ACM.

Keywords

  • GEMM
  • GPU
  • Matrix-matrix multiplication
  • Optimization
  • Tall-and-skinny

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'TSM2: Optimizing tall-and-skinny matrix-matrix multiplication on GPUs'. Together they form a unique fingerprint.

Cite this