CliZ: Optimizing Lossy Compression for Climate Datasets with Adaptive Fine-tuned Data Prediction

Zizhe Jian, Sheng Di, Jinyang Liu, Kai Zhao, Xin Liang, Haiying Xu, Robert Underwood, Shixun Wu, Jiajun Huang, Zizhong Chen, Franck Cappello

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Benefiting from the cutting-edge supercomputers that support extremely large-scale scientific simulations, climate research has advanced significantly over the past decades. However, new critical challenges have arisen regarding efficiently storing and transferring large-scale climate data among distributed repositories and databases for post hoc analysis. In this paper, we develop CliZ, an efficient online error-controlled lossy compression method with optimized data prediction and encoding methods for climate datasets across various climate models. On the one hand, we explored how to take advantage of particular properties of the climate datasets (such as mask-map information, dimension permutation/fusion, and data periodicity pattern) to improve the data prediction accuracy. On the other hand, CliZ features a novel multi-Huffman encoding method, which can significantly improve the encoding efficiency. Therefore significantly improving compression ratios. We evaluated CliZ versus many other state-of-the-art error-controlled lossy compressors (including SZ3, ZFP, SPERR, and QoZ) based on multiple real-world climate datasets with different models. Experiments show that CliZ outperforms the second-best compressor (SZ3, SPERR, or QoZ1.1) on climate datasets by 20%-200% in compression ratio. CliZ can significantly reduce the data transfer cost between the two remote Globus endpoints by 32%-38%.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024
Pages417-429
Number of pages13
ISBN (Electronic)9798350337662
DOIs
StatePublished - 2024
Event38th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 - San Francisco, United States
Duration: May 27 2024May 31 2024

Publication series

NameProceedings - 2024 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024

Conference

Conference38th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024
Country/TerritoryUnited States
CitySan Francisco
Period5/27/245/31/24

Bibliographical note

Publisher Copyright:
© 2024 IEEE.

Keywords

  • climate datasets
  • distributed data repository/database
  • error-controlled lossy compression

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'CliZ: Optimizing Lossy Compression for Climate Datasets with Adaptive Fine-tuned Data Prediction'. Together they form a unique fingerprint.

Cite this