Ultrafast Error-bounded Lossy Compression for Scientific Datasets

Xiaodong Yu, Sheng Di, Kai Zhao, Jiannan Tian, Dingwen Tao, Xin Liang, Franck Cappello

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Today's scientific high-performance computing applications and advanced instruments are producing vast volumes of data across a wide range of domains, which impose a serious burden on data transfer and storage. Error-bounded lossy compression has been developed and widely used in the scientific community because it not only can significantly reduce the data volumes but also can strictly control the data distortion based on the user-specified error bound. Existing lossy compressors, however, cannot offer ultrafast compression speed, which is highly demanded by numerous applications or use cases (such as in-memory compression and online instrument data compression). In this paper, we propose a novel ultrafast error-bounded lossy compressor that can obtain fairly high compression performance on both CPUs and GPUs and with reasonably high compression ratios. The key contributions are threefold. (1) We propose a generic error-bounded lossy compression framework - -called SZx - -that achieves ultrafast performance through its novel design comprising only lightweight operations such as bitwise and addition/subtraction operations, while still keeping a high compression ratio. (2) We implement SZx on both CPUs and GPUs and optimize the performance according to their architectures. (3) We perform a comprehensive evaluation with six real-world production-level scientific datasets on both CPUs and GPUs. Experiments show that SZx is 2∼16x faster than the second-fastest existing error-bounded lossy compressor (either SZ or ZFP) on CPUs and GPUs, with respect to both compression and decompression.

Original languageEnglish
Title of host publicationHPDC 2022 - Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing
Pages159-171
Number of pages13
ISBN (Electronic)9781450391993
DOIs
StatePublished - Jun 27 2022
Event31st International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2022 - Virtual, Online, United States
Duration: Jun 27 2022Jun 30 2022

Publication series

NameHPDC 2022 - Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing

Conference

Conference31st International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2022
Country/TerritoryUnited States
CityVirtual, Online
Period6/27/226/30/22

Bibliographical note

Funding Information:
This research was supported by the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations— the Office of Science and the National Nuclear Security Administration, responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, to support the nation’s exascale computing imperative. This research was also supported by ARAMCO. The material was supported by the U.S. Department of Energy, Office of Science and Office of Advanced Scientific Computing Research (ASCR), under contract DE-AC02-06CH11357. This research was also supported by the U.S. National Science Foundation under Grants OAC-2042084, OAC-2003709, OAC-2104023, and OAC-2104024.

Publisher Copyright:
© 2022 ACM.

Keywords

  • error-bounded lossy compression
  • gpu
  • high-speed compressor
  • scientific data

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Ultrafast Error-bounded Lossy Compression for Scientific Datasets'. Together they form a unique fingerprint.

Cite this