Ir directamente a la navegación principal Ir directamente a la búsqueda Ir directamente al contenido principal

HZCCL: Accelerating Collective Communication with Co-Designed Homomorphic Compression

  • Jiajun Huang
  • , Sheng Di
  • , Xiaodong Yu
  • , Yujia Zhai
  • , Jinyang Liu
  • , Zizhe Jian
  • , Xin Liang
  • , Kai Zhao
  • , Xiaoyi Lu
  • , Zizhong Chen
  • , Franck Cappello
  • , Yanfei Guo
  • , Rajeev Thakur

Producción científica: Conference contributionrevisión exhaustiva

7 Citas (Scopus)

Resumen

As network bandwidth struggles to keep up with rapidly growing computing capabilities, the efficiency of collective communication has become a critical challenge for exa-scale distributed and parallel applications. Traditional approaches directly utilize error-bounded lossy compression to accelerate collective computation operations, exposing unsatisfying performance due to the expensive decompression-operation-compression (DOC) workflow. To address this issue, we present a first-ever homomorphic compression-communication co-design, hZCCL, which enables operations to be performed directly on compressed data, saving the cost of time-consuming decompression and recompression. In addition to the co-design framework, we build a light-weight compressor, optimized specifically for multi-core CPU platforms. We also present a homomorphic compressor with a run-time heuristic to dynamically select efficient compression pipelines for reducing the cost of DOC handling. We evaluate h Z C C L with up to 512 nodes and across five application datasets. The experimental results demonstrate that our homomorphic compressor achieves a CPU throughput of up to 379.08 ~GB / s, surpassing the conventional DOC workflow by up to 36.53 ×. Moreover, our h Z C C L-accelerated collectives outperform two state-of-the-art baselines, delivering speedups of up to 2.12 × and 6.77 × compared to original MPI collectives in single-thread and multi-thread modes, respectively, while maintaining data accuracy.

Idioma originalEnglish
Título de la publicación alojadaProceedings of SC 2024
Subtítulo de la publicación alojadaInternational Conference for High Performance Computing, Networking, Storage and Analysis
ISBN (versión digital)9798350352917
DOI
EstadoPublished - nov 17 2024
Evento2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024 - Atlanta, United States
Duración: nov 17 2024nov 22 2024

Serie de la publicación

NombreInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
ISSN (versión impresa)2167-4329
ISSN (versión digital)2167-4337

Conference

Conference2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024
País/TerritorioUnited States
CiudadAtlanta
Período11/17/2411/22/24

Nota bibliográfica

Publisher Copyright:
© 2024 IEEE.

Financiación

This material was supported by the U.S. Dept. of Energy, Office of Science, Advanced Scientific Computing Research (ASCR), under contracts DE-AC02-06CH11357 and DE-SC0024207. The material was also supported by the National Science Foundation under Grants OAC-2003709, OAC- 2104023, OAC-2311875, OAC-2340982, OAC-2348465, and OIA-2327266. The experimental resource for this paper was provided by the Laboratory Computing Resource Center on the Bebop cluster at Argonne National Laboratory.

FinanciadoresNúmero del financiador
U.S. Department of Energy
Argonne National Laboratory
Office of Science Programs
Laboratory Computing Resource Center
National Science Foundation Arctic Social Science ProgramOAC-2003709, OAC- 2104023, OAC-2340982, OIA-2327266, OAC-2311875, OAC-2348465
Advanced Scientific Computing ResearchDE-SC0024207, DE-AC02-06CH11357

    ASJC Scopus subject areas

    • Software
    • Hardware and Architecture
    • Computer Science Applications
    • Computer Networks and Communications

    Huella

    Profundice en los temas de investigación de 'HZCCL: Accelerating Collective Communication with Co-Designed Homomorphic Compression'. En conjunto forman una huella única.

    Citar esto