Abstract
Linear algebra operations have been widely used in big data analytics and scientific computations. Many works have been done on optimizing linear algebra operations on GPUs with regular-shaped input. However, few works are focusing on fully utilizing GPU resources when the input is not regular-shaped. Current optimizations lack of considering fully utilizing the memory bandwidth and computing power, therefore they could only achieve sub-optimal performance. In this paper, we propose a performant tall-and-skinny matrix-matrix multiplication algorithm on GPUs - TSM2. It focuses on optimizing linear algebra operation with none regular-shaped input. We implement the proposed algorithm and test on three different Nvidia GPU micro-architectures: Kepler, Maxwell, and Pascal. Experiments show that our TSM2 speedups the computation by 1.1x - 3x, improves memory bandwidth utilization by 8% - 47.6%, and improves computing power utilization by 7% - 37.3% comparing to the current state-of-the-art works. We replace the original matrix operations in K-means and Algorithm-Bases Fault Tolerance (ABFT) with TSM2 and achieve up to 1.89x and 1.90x speed up.
Original language | English |
---|---|
Title of host publication | ICS 2019 - International Conference on Supercomputing |
Pages | 106-116 |
Number of pages | 11 |
ISBN (Electronic) | 9781450360791 |
DOIs | |
State | Published - Jun 26 2019 |
Event | 33rd ACM International Conference on Supercomputing, ICS 2019, held in conjunction with the Federated Computing Research Conference, FCRC 2019 - Phoenix, United States Duration: Jun 26 2019 → … |
Publication series
Name | Proceedings of the International Conference on Supercomputing |
---|
Conference
Conference | 33rd ACM International Conference on Supercomputing, ICS 2019, held in conjunction with the Federated Computing Research Conference, FCRC 2019 |
---|---|
Country/Territory | United States |
City | Phoenix |
Period | 6/26/19 → … |
Bibliographical note
Publisher Copyright:© 2019 ACM.
Keywords
- GEMM
- GPU
- Matrix-matrix multiplication
- Optimization
- Tall-and-skinny
ASJC Scopus subject areas
- General Computer Science