Abstract
As network bandwidth struggles to keep up with rapidly growing computing capabilities, the efficiency of collective communication has become a critical challenge for exa-scale distributed and parallel applications. Traditional approaches directly utilize error-bounded lossy compression to accelerate collective computation operations, exposing unsatisfying performance due to the expensive decompression-operation-compression (DOC) workflow. To address this issue, we present a first-ever homomorphic compression-communication co-design, hZCCL, which enables operations to be performed directly on compressed data, saving the cost of time-consuming decompression and recompression. In addition to the co-design framework, we build a light-weight compressor, optimized specifically for multi-core CPU platforms. We also present a homomorphic compressor with a run-time heuristic to dynamically select efficient compression pipelines for reducing the cost of DOC handling. We evaluate h Z C C L with up to 512 nodes and across five application datasets. The experimental results demonstrate that our homomorphic compressor achieves a CPU throughput of up to 379.08 ~GB / s, surpassing the conventional DOC workflow by up to 36.53 ×. Moreover, our h Z C C L-accelerated collectives outperform two state-of-the-art baselines, delivering speedups of up to 2.12 × and 6.77 × compared to original MPI collectives in single-thread and multi-thread modes, respectively, while maintaining data accuracy.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of SC 2024 |
| Subtitle of host publication | International Conference for High Performance Computing, Networking, Storage and Analysis |
| ISBN (Electronic) | 9798350352917 |
| DOIs | |
| State | Published - Nov 17 2024 |
| Event | 2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024 - Atlanta, United States Duration: Nov 17 2024 → Nov 22 2024 |
Publication series
| Name | International Conference for High Performance Computing, Networking, Storage and Analysis, SC |
|---|---|
| ISSN (Print) | 2167-4329 |
| ISSN (Electronic) | 2167-4337 |
Conference
| Conference | 2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024 |
|---|---|
| Country/Territory | United States |
| City | Atlanta |
| Period | 11/17/24 → 11/22/24 |
Bibliographical note
Publisher Copyright:© 2024 IEEE.
Funding
This material was supported by the U.S. Dept. of Energy, Office of Science, Advanced Scientific Computing Research (ASCR), under contracts DE-AC02-06CH11357 and DE-SC0024207. The material was also supported by the National Science Foundation under Grants OAC-2003709, OAC- 2104023, OAC-2311875, OAC-2340982, OAC-2348465, and OIA-2327266. The experimental resource for this paper was provided by the Laboratory Computing Resource Center on the Bebop cluster at Argonne National Laboratory.
| Funders | Funder number |
|---|---|
| U.S. Department of Energy | |
| Argonne National Laboratory | |
| Office of Science Programs | |
| Laboratory Computing Resource Center | |
| National Science Foundation Arctic Social Science Program | OAC-2003709, OAC- 2104023, OAC-2340982, OIA-2327266, OAC-2311875, OAC-2348465 |
| Advanced Scientific Computing Research | DE-SC0024207, DE-AC02-06CH11357 |
Keywords
- Collective Communication
- Distributed Computing
- Homomorphic Compression
- Parallel Algorithm
ASJC Scopus subject areas
- Software
- Hardware and Architecture
- Computer Science Applications
- Computer Networks and Communications
Fingerprint
Dive into the research topics of 'HZCCL: Accelerating Collective Communication with Co-Designed Homomorphic Compression'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver