Abstract
As network bandwidth struggles to keep up with rapidly growing computing capabilities, the efficiency of collective communication has become a critical challenge for exa-scale distributed and parallel applications. Traditional approaches directly utilize error-bounded lossy compression to accelerate collective computation operations, exposing unsatisfying performance due to the expensive decompression-operation-compression (DOC) workflow. To address this issue, we present a first-ever homomorphic compression-communication co-design, hZCCL, which enables operations to be performed directly on compressed data, saving the cost of time-consuming decompression and recompression. In addition to the co-design framework, we build a light-weight compressor, optimized specifically for multi-core CPU platforms. We also present a homomorphic compressor with a run-time heuristic to dynamically select efficient compression pipelines for reducing the cost of DOC handling. We evaluate h Z C C L with up to 512 nodes and across five application datasets. The experimental results demonstrate that our homomorphic compressor achieves a CPU throughput of up to 379.08 ~GB / s, surpassing the conventional DOC workflow by up to 36.53 ×. Moreover, our h Z C C L-accelerated collectives outperform two state-of-the-art baselines, delivering speedups of up to 2.12 × and 6.77 × compared to original MPI collectives in single-thread and multi-thread modes, respectively, while maintaining data accuracy.
Original language | English |
---|---|
Title of host publication | Proceedings of SC 2024 |
Subtitle of host publication | International Conference for High Performance Computing, Networking, Storage and Analysis |
ISBN (Electronic) | 9798350352917 |
DOIs | |
State | Published - 2024 |
Event | 2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024 - Atlanta, United States Duration: Nov 17 2024 → Nov 22 2024 |
Publication series
Name | International Conference for High Performance Computing, Networking, Storage and Analysis, SC |
---|---|
ISSN (Print) | 2167-4329 |
ISSN (Electronic) | 2167-4337 |
Conference
Conference | 2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024 |
---|---|
Country/Territory | United States |
City | Atlanta |
Period | 11/17/24 → 11/22/24 |
Bibliographical note
Publisher Copyright:© 2024 IEEE.
Keywords
- Collective Communication
- Distributed Computing
- Homomorphic Compression
- Parallel Algorithm
ASJC Scopus subject areas
- Computer Networks and Communications
- Computer Science Applications
- Hardware and Architecture
- Software