Significantly Improving Lossy Compression for HPC Datasets with Second-Order Prediction and Parameter Optimization

Kai Zhao, Sheng Di, Xin Liang, Sihuan Li, Dingwen Tao, Zizhong Chen, Franck Cappello

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

39 Scopus citations

Abstract

Today's extreme-scale high-performance computing (HPC) applications are producing volumes of data too large to save or transfer because of limited storage space and I/O bandwidth. Error-bounded lossy compression has been commonly known as one of the best solutions to the big science data issue, because it can significantly reduce the data volume with strictly controlled data distortion based on user requirements. In this work, we develop an adaptive parameter optimization algorithm integrated with a series of optimization strategies for SZ, a state-of-the-art prediction-based compression model. Our contribution is threefold. (1) We exploit effective strategies by using 2nd-order regression and 2nd-order Lorenzo predictors to improve the prediction accuracy significantly for SZ, thus substantially improving the overall compression quality. (2) We design an efficient approach selecting the best-fit parameter setting, by conducting a comprehensive priori compression quality analysis and exploiting an efficient online controlling mechanism. (3) We evaluate the compression quality and performance on a supercomputer with 4,096 cores, as compared with other state-of-the-art error-bounded lossy compressors. Experiments with multiple real-world HPC simulations datasets show that our solution can improve the compression ratio up to 46% compared with the second-best compressor. Moreover, the parallel I/O performance is improved by up to 40% thanks to the significant reduction of data size.

Original languageEnglish
Title of host publicationHPDC 2020 - Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing
Pages89-100
Number of pages12
ISBN (Electronic)9781450370523
DOIs
StatePublished - Jun 23 2020
Event29th International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2020 - Stockholm, Sweden
Duration: Jun 23 2020Jun 26 2020

Publication series

NameHPDC 2020 - Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing

Conference

Conference29th International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2020
Country/TerritorySweden
CityStockholm
Period6/23/206/26/20

Bibliographical note

Publisher Copyright:
© 2020 Owner/Author.

Keywords

  • high-performance computing
  • lossy compression
  • parameter optimization
  • rate distortion
  • science data

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Significantly Improving Lossy Compression for HPC Datasets with Second-Order Prediction and Parameter Optimization'. Together they form a unique fingerprint.

Cite this