TY - JOUR
T1 - SZ3
T2 - A Modular Framework for Composing Prediction-Based Error-Bounded Lossy Compressors
AU - Liang, Xin
AU - Zhao, Kai
AU - Di, Sheng
AU - Li, Sihuan
AU - Underwood, Robert
AU - Gok, Ali M.
AU - Tian, Jiannan
AU - Deng, Junjing
AU - Calhoun, Jon C.
AU - Tao, Dingwen
AU - Chen, Zizhong
AU - Cappello, Franck
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2023/4/1
Y1 - 2023/4/1
N2 - Today's scientific simulations require a significant reduction of data volume because of extremely large amounts of data they produce and the limited I/O bandwidth and storage space. Error-bounded lossy compression has been considered one of the most effective solutions to the above problem. In practice, however, the best-fit compression method often needs to be customized or optimized in particular because of diverse characteristics in different datasets and various user requirements on the compression quality and performance. In this paper, we address this issue with a novel modular, composable compression framework named SZ3. Our contributions are four-folds. (1) We develop SZ3 which features an innovative modular abstraction for the prediction-based compression framework, such that compression modules can be plugged in easily to create new compressors based on characteristics of data and user requirements. (2) We create a new compression pipeline by SZ3 for GAMESS data, which significantly improves the compression ratios over state-of-the-art compressors. (3) We develop an adaptive compression pipeline by SZ3 for APS data with minimal efforts, which leads to the best rate-distortion among all existing error-bounded lossy compressors for any bit-rate. (4) We compare the sustainability of SZ3 with leading error-bounded prediction-based compressors, and then demonstrate the necessity of diverse pipelines by integrating and evaluating several compression pipelines on diverse scientific datasets from multiple disciplines. Experiments show that SZ3 incurs very limited overhead in compressor integration and our customized compression pipelines lead to up to 20% improvement in compression ratios under the same data distortion, when compared with the best existing approach.
AB - Today's scientific simulations require a significant reduction of data volume because of extremely large amounts of data they produce and the limited I/O bandwidth and storage space. Error-bounded lossy compression has been considered one of the most effective solutions to the above problem. In practice, however, the best-fit compression method often needs to be customized or optimized in particular because of diverse characteristics in different datasets and various user requirements on the compression quality and performance. In this paper, we address this issue with a novel modular, composable compression framework named SZ3. Our contributions are four-folds. (1) We develop SZ3 which features an innovative modular abstraction for the prediction-based compression framework, such that compression modules can be plugged in easily to create new compressors based on characteristics of data and user requirements. (2) We create a new compression pipeline by SZ3 for GAMESS data, which significantly improves the compression ratios over state-of-the-art compressors. (3) We develop an adaptive compression pipeline by SZ3 for APS data with minimal efforts, which leads to the best rate-distortion among all existing error-bounded lossy compressors for any bit-rate. (4) We compare the sustainability of SZ3 with leading error-bounded prediction-based compressors, and then demonstrate the necessity of diverse pipelines by integrating and evaluating several compression pipelines on diverse scientific datasets from multiple disciplines. Experiments show that SZ3 incurs very limited overhead in compressor integration and our customized compression pipelines lead to up to 20% improvement in compression ratios under the same data distortion, when compared with the best existing approach.
KW - Big data
KW - data reduction
KW - error-bounded lossy compression
KW - large-scale scientific simulation
UR - http://www.scopus.com/inward/record.url?scp=85137606689&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85137606689&partnerID=8YFLogxK
U2 - 10.1109/TBDATA.2022.3201176
DO - 10.1109/TBDATA.2022.3201176
M3 - Article
AN - SCOPUS:85137606689
SN - 2332-7790
VL - 9
SP - 485
EP - 498
JO - IEEE Transactions on Big Data
JF - IEEE Transactions on Big Data
IS - 2
ER -