Grants and Contracts Details
Description
Project Summary
Overview:
The goal of this project is to develop an e?cient and scalable trust-driven lossy data compression
infrastructure capable of controlling errors in downstream Quantities of Interests (QoIs) derived
from raw data, thereby making lossy compression usable and adoptable by the scienti?c community.
Scienti?c simulations running in advanced cyberinfrastructures are producing data at unprecedented
speeds and amounts, necessitating the need for e?ective reduction. However, existing reduction
techniques either overlook error quanti?cation or provide error control only for raw data, leaving
uncertainties in the outcome of downstream QoIs computed from these raw data. This project aims
to bridge this gap to improve the trustability of lossy data reduction. Outcomes will be integrated
into state-of-the-art compression frameworks and validated using signi?cant QoIs from multiple
applications across di?erent domains in advanced cyberinfrastructures. Driven by actual concerns
from application scientists, the success of this project is expected to facilitate the use of lossy data
reduction in the scienti?c community for e?cient data storage, transmission, and analytics.
Intellectual Merit:
Although lossy compression is recognized as a viable way to cope with today’s numerous data,
lacking uncertainty or error quanti?cation on downstream QoIs derived from raw data is a crucial
issue. This creates a dilemma for scientists. While storing all data with exact precision fosters new
scienti?c discovery, it is nearly impossible due to the huge amount. Yet compression of the data
without proper care impacts the trustability of the discoveries gained from the reduced data. This
proposed research will address these problems through a marriage of theory and implementation.
First, a novel theory enabling error control on downstream QoIs will be developed. This will
fundamentally address the trustability issues of existing error-controlled lossy compressors which
provide error control only on raw data. Second, an optimization method ensuring tight error control
will be proposed based on in-depth analysis. This will allow for higher compression ratios under the
same requirements. Third, a scalable infrastructure will be built through a careful integration with
state-of-the-art compression frameworks and tailored parallelization based on target QoIs, in order
to take full advantage of existing compression algorithms and computational patterns in the target
QoIs. The proposed infrastructure will also be incorporated into real-world scienti?c applications
to reduce the size of their data for full-state data storage, fast I/O, and e?cient transmission.
Broader Impacts:
The project, if successful, will contribute to next generation data reduction and management in
advanced cyberinfrastructures, including the upcoming exascale computing systems. The proposed
research will enable application scientists to store the most valuable information in their data based
on their unique needs, creating opportunities for novel scienti?c ?ndings. It will reduce the time to
insights for multiple scienti?c domains including cosmology, climatology, and seismology. Moreover,
it will create a platform for application scientists and computer scientists to work together, building
up synergies and fostering new collaborations. The proposed algorithms will be implemented into
open-source tools and optimized in high-end computing systems, which will enhance the research
infrastructure for the scienti?c community. Results of this project will be submitted for publications
in relevant peer-reviewed conferences and journals. Furthermore, this proposed work will contribute
to the education and training of K-12, undergraduate, and graduate students via new course
materials and outreach activities at Missouri University of Science and Technology (Missouri S&T),
providing students a deep understanding of big data management and enhancing their ability to
conduct interdisciplinary research.
Status | Active |
---|---|
Effective start/end date | 7/1/23 → 3/31/25 |
Funding
- National Science Foundation: $151,695.00
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.