Grants and Contracts Details
Project Summary Overview: The goal of this project is to develop an e?cient and scalable trust-driven lossy data compression infrastructure capable of controlling errors in downstream Quantities of Interests (QoIs) derived from raw data, thereby making lossy compression usable and adoptable by the scienti?c community. Scienti?c simulations running in advanced cyberinfrastructures are producing data at unprecedented speeds and amounts, necessitating the need for e?ective reduction. However, existing reduction techniques either overlook error quanti?cation or provide error control only for raw data, leaving uncertainties in the outcome of downstream QoIs computed from these raw data. This project aims to bridge this gap to improve the trustability of lossy data reduction. Outcomes will be integrated into state-of-the-art compression frameworks and validated using signi?cant QoIs from multiple applications across di?erent domains in advanced cyberinfrastructures. Driven by actual concerns from application scientists, the success of this project is expected to facilitate the use of lossy data reduction in the scienti?c community for e?cient data storage, transmission, and analytics. Intellectual Merit: Although lossy compression is recognized as a viable way to cope with today’s numerous data, lacking uncertainty or error quanti?cation on downstream QoIs derived from raw data is a crucial issue. This creates a dilemma for scientists. While storing all data with exact precision fosters new scienti?c discovery, it is nearly impossible due to the huge amount. Yet compression of the data without proper care impacts the trustability of the discoveries gained from the reduced data. This proposed research will address these problems through a marriage of theory and implementation. First, a novel theory enabling error control on downstream QoIs will be developed. This will fundamentally address the trustability issues of existing error-controlled lossy compressors which provide error control only on raw data. Second, an optimization method ensuring tight error control will be proposed based on in-depth analysis. This will allow for higher compression ratios under the same requirements. Third, a scalable infrastructure will be built through a careful integration with state-of-the-art compression frameworks and tailored parallelization based on target QoIs, in order to take full advantage of existing compression algorithms and computational patterns in the target QoIs. The proposed infrastructure will also be incorporated into real-world scienti?c applications to reduce the size of their data for full-state data storage, fast I/O, and e?cient transmission. Broader Impacts: The project, if successful, will contribute to next generation data reduction and management in advanced cyberinfrastructures, including the upcoming exascale computing systems. The proposed research will enable application scientists to store the most valuable information in their data based on their unique needs, creating opportunities for novel scienti?c ?ndings. It will reduce the time to insights for multiple scienti?c domains including cosmology, climatology, and seismology. Moreover, it will create a platform for application scientists and computer scientists to work together, building up synergies and fostering new collaborations. The proposed algorithms will be implemented into open-source tools and optimized in high-end computing systems, which will enhance the research infrastructure for the scienti?c community. Results of this project will be submitted for publications in relevant peer-reviewed conferences and journals. Furthermore, this proposed work will contribute to the education and training of K-12, undergraduate, and graduate students via new course materials and outreach activities at Missouri University of Science and Technology (Missouri S&T), providing students a deep understanding of big data management and enhancing their ability to conduct interdisciplinary research.
|Effective start/end date||7/1/23 → 3/31/25|
- National Science Foundation: $151,695.00
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.