Grants and Contracts Details
Description
Overview:
Today’s extreme-scale simulations and high-resolution instruments produce scientific data at an unprecedented
amount and rate, causing new difficulties in transforming the data to scientific insights via data
analytics. Due to the massive nature of the data and the diverse requirements of scientific analytics, there
is a growing need on managing data in a progressive fashion, such that users can stream as much data as
they need to carry out their analytics. While this is desired by many scientific simulations, observations,
and instruments, significant gap exists in applying it in practical use, as there is not a universal representation
that works for all the applications. Furthermore, little effort has been put to create robust and scalable
cyberinfrastructures (CIs) that link the algorithmic innovations in progressive representation with scientific
data analytics, leaving the deployment of such service a big challenge for the scientists. In this project, we
aim to develop a sustainable framework that supports progressive management of scientific data to facilitate
its use in scientific applications. The core scopes include a unification of viable progressive representations
and tailored development for in-situ and post-hoc analytic routines. Driven by the need of real-world applications,
the success of this project will advance a wide range of disciplines by creating opportunities for
novel and accelerated scientific discoveries.
Intellectual Merit:
This project proposes to heavily extend existing data management capabilities in ADIOS-2, especially
the support of in-situ and post-hoc analytics, to allow for progressive data storage, transmission, and access
by deep incorporation of viable progressive techniques. This will be accomplished by the development of
three engines in ProDM. First, a data engine will be built to unify progressive representations and provide
recommendations based on application needs, along with portable hardware support for accelerators and
interoperative software interfaces to data analytic libraries. Second, an in-situ engine will be developed to
facilitate the use of progressive representations in in-situ data analytics, which include redesign of in-situ
semantics and adjustment of runtime dynamics. Third, a post-hoc engine will be developed to incorporate
erasure encoding and hierarchical data placement to progressive representations, in order to improve the
reliability and performance in post-hoc data analytics. The proposed framework will also be incorporated
with state-of-the-art data management software to produce an enhanced library with extended features, and
deployed on campus-wide computing infrastructures for comprehensive evaluation with important applications
from climate, fusion, and molecular dynamics.
Broader Impacts:
The project has broad impacts on a wide range of science and engineering areas that are facing challenges
in data management and analytics as laid out in the solicitation. This includes “capability of real- and
near-real-time manipulation of data” in GEO, “software and tools that enable progress on key questions in
astronomy and astrophysics” in AST and “support for enabling data-driven discovery in molecular science”
in CHE under MPS, and et cetera. Success of this project will establish a new paradigm of scientific data
management, allowing for novel scientific discoveries and/or reduced time to insights for broad disciplines.
Its delivery will be implemented into open-source libraries and disseminated in related workshops and tutorials.
Moreover, it will deeply involve application scientists and computer scientists to work together, building
up synergies and fostering discussions that would potentially improve national CI ecosystem. Furthermore,
this proposed work will contribute to the education, training, and workforce development of future CI users
and developers including K-12, undergraduate, and graduate students via new course materials and outreach
activities at UK, NJIT, and Temple.
Status | Active |
---|---|
Effective start/end date | 8/1/23 → 7/31/26 |
Funding
- National Science Foundation: $239,952.00
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.