Abstract
Benefiting from the cutting-edge supercomputers that support extremely large-scale scientific simulations, climate research has advanced significantly over the past decades. However, new critical challenges have arisen regarding efficiently storing and transferring large-scale climate data among distributed repositories and databases for post hoc analysis. In this paper, we develop CliZ, an efficient online error-controlled lossy compression method with optimized data prediction and encoding methods for climate datasets across various climate models. On the one hand, we explored how to take advantage of particular properties of the climate datasets (such as mask-map information, dimension permutation/fusion, and data periodicity pattern) to improve the data prediction accuracy. On the other hand, CliZ features a novel multi-Huffman encoding method, which can significantly improve the encoding efficiency. Therefore significantly improving compression ratios. We evaluated CliZ versus many other state-of-the-art error-controlled lossy compressors (including SZ3, ZFP, SPERR, and QoZ) based on multiple real-world climate datasets with different models. Experiments show that CliZ outperforms the second-best compressor (SZ3, SPERR, or QoZ1.1) on climate datasets by 20%-200% in compression ratio. CliZ can significantly reduce the data transfer cost between the two remote Globus endpoints by 32%-38%.
Original language | English |
---|---|
Title of host publication | Proceedings - 2024 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 |
Pages | 417-429 |
Number of pages | 13 |
ISBN (Electronic) | 9798350337662 |
DOIs | |
State | Published - 2024 |
Event | 38th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 - San Francisco, United States Duration: May 27 2024 → May 31 2024 |
Publication series
Name | Proceedings - 2024 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 |
---|
Conference
Conference | 38th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 |
---|---|
Country/Territory | United States |
City | San Francisco |
Period | 5/27/24 → 5/31/24 |
Bibliographical note
Publisher Copyright:© 2024 IEEE.
Keywords
- climate datasets
- distributed data repository/database
- error-controlled lossy compression
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Networks and Communications
- Computer Science Applications
- Hardware and Architecture