Maintaining Trust in Reduction: Preserving the Accuracy of Quantities of Interest for Lossy Compression

Qian Gong, Xin Liang, Ben Whitney, Jong Youl Choi, Jieyang Chen, Lipeng Wan, Stéphane Ethier, Seung Hoe Ku, R. Michael Churchill, C. S. Chang, Mark Ainsworth, Ozan Tugluk, Todd Munson, David Pugmire, Richard Archibald, Scott Klasky

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

As the growth of data sizes continues to outpace computational resources, there is a pressing need for data reduction techniques that can significantly reduce the amount of data and quantify the error incurred in compression. Compressing scientific data presents many challenges for reduction techniques since it is often on non-uniform or unstructured meshes, is from a high-dimensional space, and has many Quantities of Interests (QoIs) that need to be preserved. To illustrate these challenges, we focus on data from a large scale fusion code, XGC. XGC uses a Particle-In-Cell (PIC) technique which generates hundreds of PetaBytes (PBs) of data a day, from thousands of timesteps. XGC uses an unstructured mesh, and needs to compute many QoIs from the raw data, f. One critical aspect of the reduction is that we need to ensure that QoIs derived from the data (density, temperature, flux surface averaged momentums, etc.) maintain a relative high accuracy. We show that by compressing XGC data on the high-dimensional, nonuniform grid on which the data is defined, and adaptively quantizing the decomposed coefficients based on the characteristics of the QoIs, the compression ratios at various error tolerances obtained using a multilevel compressor (MGARD) increases more than ten times. We then present how to mathematically guarantee that the accuracy of the QoIs computed from the reduced f is preserved during the compression. We show that the error in the XGC density can be kept under a user-specified tolerance over 1000 timesteps of simulation using the mathematical QoI error control theory of MGARD, whereas traditional error control on the data to be reduced does not guarantee the accuracy of the QoIs.

Original languageEnglish
Title of host publicationDriving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation - 21st Smoky Mountains Computational Sciences and Engineering, SMC 2021, Revised Selected Papers
Editors[given-name]Jeffrey Nichols, [given-name]Arthur ‘Barney’ Maccabe, James Nutaro, Swaroop Pophale, Pravallika Devineni, Theresa Ahearn, Becky Verastegui
Pages22-39
Number of pages18
DOIs
StatePublished - 2022
Event21st Smoky Mountains Computational Sciences and Engineering Conference, SMC 2021 - Virtual, Online
Duration: Oct 18 2021Oct 20 2021

Publication series

NameCommunications in Computer and Information Science
Volume1512 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference21st Smoky Mountains Computational Sciences and Engineering Conference, SMC 2021
CityVirtual, Online
Period10/18/2110/20/21

Bibliographical note

Publisher Copyright:
© 2022, Springer Nature Switzerland AG.

Funding

Acknowledgement. This research was supported by the ECP CODAR, Sirius-2, and RAPIDS-2 projects through the Advanced Scientific Computing Research (ASCR) program of Department of Energy, and the LDRD project through DRD program of Oak Ridge National Laboratory.

FundersFunder number
U.S. Department of Energy EPSCoR
Advanced Scientific Computing Research
Oak Ridge National Laboratory
Laboratory Directed Research and Development

    Keywords

    • Error control
    • Lossy compression
    • Quantities of interest
    • XGC simulation data

    ASJC Scopus subject areas

    • General Computer Science
    • General Mathematics

    Fingerprint

    Dive into the research topics of 'Maintaining Trust in Reduction: Preserving the Accuracy of Quantities of Interest for Lossy Compression'. Together they form a unique fingerprint.

    Cite this