Toward Quantity-of-Interest Preserving Lossy Compression for Scientific Data

Pu Jiao, Sheng Di, Hanqi Guo, Kai Zhao, Jiannan Tian, Dingwen Tao, Xin Liang, Franck Cappello

Research output: Contribution to journalArticlepeer-review

Abstract

Today’s scientific simulations and instruments are producing a large amount of data, leading to difficulties in storing, transmitting, and analyzing these data. While error-controlled lossy compressors are effective in significantly reducing data volumes and efficiently developing databases for multiple scientific applications, they mainly support error controls on raw data, which leaves a significant gap between the data and user’s downstream analysis. This may cause unqualified uncertainties in the outcomes of the analysis, a.k.a quantities of interest (QoIs), which are the major concerns of users in adopting lossy compression in practice. In this paper, we propose rigorous mathematical theories to preserve four families of QoIs that are widely used in scientific analysis during lossy compression along with practical implementations. Specifically, we first develop the error control theory for univariate QoIs which are essential for computing physical properties such as kinetic energy, followed by multivariate QoIs that are more commonly used in real-world applications. The proposed method is integrated into a state-of-the-art compression framework in a modular fashion, which could easily adapt to new QoIs and new compression algorithms. Experiments on real-world datasets demonstrate that the proposed method provides faithful error control on important QoIs including kinetic energy, regional average, and isosurface without trials and errors, while offering compression ratios that are up to 4× of the compression ratios provided by state-of-the-art compressors.

Original languageEnglish
Pages (from-to)697-710
Number of pages14
JournalProceedings of the VLDB Endowment
Volume16
Issue number4
DOIs
StatePublished - 2022

Bibliographical note

Funding Information:
This work was supported by the National Science Foundation under Grants OAC-2003709, OAC-2042084/2303064, OAC-2104023, and OAC-2153451. The material was supported by the U.S. Department of Energy, Office of Science and Office of Advanced Scientific Computing Research (ASCR), under contract DE-AC02-06CH11357. This research was also supported by the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations – the Office of Science and the National Nu-clear Security Administration, responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, to support the nation’s exascale computing imperative. This work used the Foundry cluster that was supported by the National Science Foundation under Grant OAC-1919789.

Funding Information:
This work was supported by the National Science Foundation under Grants OAC-2003709, OAC-2042084/2303064, OAC-2104023, and OAC-2153451. The material was supported by the U.S. Department of Energy, Office of Science and Office of Advanced Scientific Computing Research (ASCR), under contract DE-AC02-06CH11357. This research was also supported by the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations – the Office of Science and the National Nuclear Security Administration, responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, to support the nation’s exascale computing imperative. This work used the Foundry cluster that was supported by the National Science Foundation under Grant OAC-1919789.

Publisher Copyright:
© 2022, VLDB Endowment. All rights reserved.

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science (all)

Fingerprint

Dive into the research topics of 'Toward Quantity-of-Interest Preserving Lossy Compression for Scientific Data'. Together they form a unique fingerprint.

Cite this