Abstract
Today’s scientific simulations and instruments are producing a large amount of data, leading to difficulties in storing, transmitting, and analyzing these data. While error-controlled lossy compressors are effective in significantly reducing data volumes and efficiently developing databases for multiple scientific applications, they mainly support error controls on raw data, which leaves a significant gap between the data and user’s downstream analysis. This may cause unqualified uncertainties in the outcomes of the analysis, a.k.a quantities of interest (QoIs), which are the major concerns of users in adopting lossy compression in practice. In this paper, we propose rigorous mathematical theories to preserve four families of QoIs that are widely used in scientific analysis during lossy compression along with practical implementations. Specifically, we first develop the error control theory for univariate QoIs which are essential for computing physical properties such as kinetic energy, followed by multivariate QoIs that are more commonly used in real-world applications. The proposed method is integrated into a state-of-the-art compression framework in a modular fashion, which could easily adapt to new QoIs and new compression algorithms. Experiments on real-world datasets demonstrate that the proposed method provides faithful error control on important QoIs including kinetic energy, regional average, and isosurface without trials and errors, while offering compression ratios that are up to 4× of the compression ratios provided by state-of-the-art compressors.
Original language | English |
---|---|
Pages (from-to) | 697-710 |
Number of pages | 14 |
Journal | Proceedings of the VLDB Endowment |
Volume | 16 |
Issue number | 4 |
DOIs | |
State | Published - 2022 |
Bibliographical note
Funding Information:This work was supported by the National Science Foundation under Grants OAC-2003709, OAC-2042084/2303064, OAC-2104023, and OAC-2153451. The material was supported by the U.S. Department of Energy, Office of Science and Office of Advanced Scientific Computing Research (ASCR), under contract DE-AC02-06CH11357. This research was also supported by the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations – the Office of Science and the National Nu-clear Security Administration, responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, to support the nation’s exascale computing imperative. This work used the Foundry cluster that was supported by the National Science Foundation under Grant OAC-1919789.
Funding Information:
This work was supported by the National Science Foundation under Grants OAC-2003709, OAC-2042084/2303064, OAC-2104023, and OAC-2153451. The material was supported by the U.S. Department of Energy, Office of Science and Office of Advanced Scientific Computing Research (ASCR), under contract DE-AC02-06CH11357. This research was also supported by the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations – the Office of Science and the National Nuclear Security Administration, responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, to support the nation’s exascale computing imperative. This work used the Foundry cluster that was supported by the National Science Foundation under Grant OAC-1919789.
Publisher Copyright:
© 2022, VLDB Endowment. All rights reserved.
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- Computer Science (all)