Skip to main navigation Skip to search Skip to main content

HZCCL: Accelerating Collective Communication with Co-Designed Homomorphic Compression

  • Jiajun Huang
  • , Sheng Di
  • , Xiaodong Yu
  • , Yujia Zhai
  • , Jinyang Liu
  • , Zizhe Jian
  • , Xin Liang
  • , Kai Zhao
  • , Xiaoyi Lu
  • , Zizhong Chen
  • , Franck Cappello
  • , Yanfei Guo
  • , Rajeev Thakur

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

As network bandwidth struggles to keep up with rapidly growing computing capabilities, the efficiency of collective communication has become a critical challenge for exa-scale distributed and parallel applications. Traditional approaches directly utilize error-bounded lossy compression to accelerate collective computation operations, exposing unsatisfying performance due to the expensive decompression-operation-compression (DOC) workflow. To address this issue, we present a first-ever homomorphic compression-communication co-design, hZCCL, which enables operations to be performed directly on compressed data, saving the cost of time-consuming decompression and recompression. In addition to the co-design framework, we build a light-weight compressor, optimized specifically for multi-core CPU platforms. We also present a homomorphic compressor with a run-time heuristic to dynamically select efficient compression pipelines for reducing the cost of DOC handling. We evaluate h Z C C L with up to 512 nodes and across five application datasets. The experimental results demonstrate that our homomorphic compressor achieves a CPU throughput of up to 379.08 ~GB / s, surpassing the conventional DOC workflow by up to 36.53 ×. Moreover, our h Z C C L-accelerated collectives outperform two state-of-the-art baselines, delivering speedups of up to 2.12 × and 6.77 × compared to original MPI collectives in single-thread and multi-thread modes, respectively, while maintaining data accuracy.

Original languageEnglish
Title of host publicationProceedings of SC 2024
Subtitle of host publicationInternational Conference for High Performance Computing, Networking, Storage and Analysis
ISBN (Electronic)9798350352917
DOIs
StatePublished - Nov 17 2024
Event2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024 - Atlanta, United States
Duration: Nov 17 2024Nov 22 2024

Publication series

NameInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
ISSN (Print)2167-4329
ISSN (Electronic)2167-4337

Conference

Conference2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024
Country/TerritoryUnited States
CityAtlanta
Period11/17/2411/22/24

Bibliographical note

Publisher Copyright:
© 2024 IEEE.

Funding

This material was supported by the U.S. Dept. of Energy, Office of Science, Advanced Scientific Computing Research (ASCR), under contracts DE-AC02-06CH11357 and DE-SC0024207. The material was also supported by the National Science Foundation under Grants OAC-2003709, OAC- 2104023, OAC-2311875, OAC-2340982, OAC-2348465, and OIA-2327266. The experimental resource for this paper was provided by the Laboratory Computing Resource Center on the Bebop cluster at Argonne National Laboratory.

FundersFunder number
U.S. Department of Energy
Argonne National Laboratory
Office of Science Programs
Laboratory Computing Resource Center
National Science Foundation Arctic Social Science ProgramOAC-2003709, OAC- 2104023, OAC-2340982, OIA-2327266, OAC-2311875, OAC-2348465
Advanced Scientific Computing ResearchDE-SC0024207, DE-AC02-06CH11357

    Keywords

    • Collective Communication
    • Distributed Computing
    • Homomorphic Compression
    • Parallel Algorithm

    ASJC Scopus subject areas

    • Software
    • Hardware and Architecture
    • Computer Science Applications
    • Computer Networks and Communications

    Fingerprint

    Dive into the research topics of 'HZCCL: Accelerating Collective Communication with Co-Designed Homomorphic Compression'. Together they form a unique fingerprint.

    Cite this