Collaborative Research: Elements: ProDM: Developing A Unified Progressive Data Management Library for Exascale Computational Science

Grants and Contracts Details

Description

Overview: Today’s extreme-scale simulations and high-resolution instruments produce scientific data at an unprecedented amount and rate, causing new difficulties in transforming the data to scientific insights via data analytics. Due to the massive nature of the data and the diverse requirements of scientific analytics, there is a growing need on managing data in a progressive fashion, such that users can stream as much data as they need to carry out their analytics. While this is desired by many scientific simulations, observations, and instruments, significant gap exists in applying it in practical use, as there is not a universal representation that works for all the applications. Furthermore, little effort has been put to create robust and scalable cyberinfrastructures (CIs) that link the algorithmic innovations in progressive representation with scientific data analytics, leaving the deployment of such service a big challenge for the scientists. In this project, we aim to develop a sustainable framework that supports progressive management of scientific data to facilitate its use in scientific applications. The core scopes include a unification of viable progressive representations and tailored development for in-situ and post-hoc analytic routines. Driven by the need of real-world applications, the success of this project will advance a wide range of disciplines by creating opportunities for novel and accelerated scientific discoveries. Intellectual Merit: This project proposes to heavily extend existing data management capabilities in ADIOS-2, especially the support of in-situ and post-hoc analytics, to allow for progressive data storage, transmission, and access by deep incorporation of viable progressive techniques. This will be accomplished by the development of three engines in ProDM. First, a data engine will be built to unify progressive representations and provide recommendations based on application needs, along with portable hardware support for accelerators and interoperative software interfaces to data analytic libraries. Second, an in-situ engine will be developed to facilitate the use of progressive representations in in-situ data analytics, which include redesign of in-situ semantics and adjustment of runtime dynamics. Third, a post-hoc engine will be developed to incorporate erasure encoding and hierarchical data placement to progressive representations, in order to improve the reliability and performance in post-hoc data analytics. The proposed framework will also be incorporated with state-of-the-art data management software to produce an enhanced library with extended features, and deployed on campus-wide computing infrastructures for comprehensive evaluation with important applications from climate, fusion, and molecular dynamics. Broader Impacts: The project has broad impacts on a wide range of science and engineering areas that are facing challenges in data management and analytics as laid out in the solicitation. This includes “capability of real- and near-real-time manipulation of data” in GEO, “software and tools that enable progress on key questions in astronomy and astrophysics” in AST and “support for enabling data-driven discovery in molecular science” in CHE under MPS, and et cetera. Success of this project will establish a new paradigm of scientific data management, allowing for novel scientific discoveries and/or reduced time to insights for broad disciplines. Its delivery will be implemented into open-source libraries and disseminated in related workshops and tutorials. Moreover, it will deeply involve application scientists and computer scientists to work together, building up synergies and fostering discussions that would potentially improve national CI ecosystem. Furthermore, this proposed work will contribute to the education, training, and workforce development of future CI users and developers including K-12, undergraduate, and graduate students via new course materials and outreach activities at UK, NJIT, and Temple.
StatusActive
Effective start/end date8/1/237/31/26

Funding

  • National Science Foundation: $239,952.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.