Detalles del proyecto
Description
Ever-growing volumes and rates of scientific data generated for post-hoc analysis on heterogeneous
platforms present severe memory- and data-movement-centric challenges for many science domains,
calling for efficient data-reduction technologies. With the GPU dominance in heterogeneous computing
nowadays, however, the ecosystem of GPU-based scientific data compressors is still maturing and needs
to address the following gaps: 1) the vendor (CUDA) lock-in; 2) the lack of integration of in-depth
scientific workflow; 3) the lack of user-friendly interface and out-of-box solution. To promote the ease of
use of GPU-based scientific data compressors, we will work with science partners from the domains of
cosmology and material science, creating an ecosystem called SGCC (Scientific GPU Compression
Cyberinfrastructure). SGCC is intended for automating the end-to-end workflow, with GPU compression
inlined with the data staging, in situ data processing, and data-driven post hoc analysis. Eventually, SGCC
will provide users with a user-friendly, high-performance GPU-accelerated data reduction for all major
GPU-equipped supercomputing platforms.
We will build SGCC, a user-friendly cyberinfrastructure for the end-to-end solution of GPU
accelerated data compression for scientific applications, by porting, adapting, optimizing, and extending
multiple existing capabilities, including cuSZ (the GPU version of the state-of-theart error-bounded lossy
compression framework—SZ), standalone GPU-based lossless encoders, QCAT (state-of-the-art CPU-
based compression quality assessment toolset), Kokkos (the multi-GPU-backend performance-portability
ecosystem), LibPressio (the unified programming interface of various scientific compressors), and HDF5
I/O library. To craft SGCC, we will combine three thrusts: (1) Promote backend capability: Ensure
efficient analysis workflow integrated with GPU accelerated scientific data compressors, providing
adequate data compression schemes for diverse data formats (Binary, HDF5, etc.) and enabling quality-
oriented compression configuration autotuning; (2) Promote frontend capability: Improve the user
interface in the GPU-accelerated datareduction ecosystem by providing high-level language bindings
(e.g., Python), command-line tools (CLI), and a graphical user interface (GUI) integrated with
visualization functionality, such as an embedded view in Jupyter notebook, which can advance user
productivity. (3) Promote portability: Enable state-of-the-art GPU-accelerated scientific data compressors
on emerging heterogeneous computing platforms, such as NVIDIA, AMD, and Intel.
SGCC will dedicate the development efforts and the use of GPU-accelerated lossy compression for
sciences and GPU-platform operations, including cosmology sciences and material science (X-ray).
Through collaborations with multiple scientific and operational partners, we will jointly optimize the
performance of SGCC and evaluate its implementation. SGCC will mitigate the data challenges on
emerging supercomputing systems, improve posthoc analysis efficiency for users, and accelerate
scientific discovery by increasing the scalability due to data reduction This project will continuously
contribute to the education and training of graduate students by enhancing the quality of computing-
related curricula in heterogeneous scientific computing, scientific data management, and
compression/visualization through outreach activities at the University of Chicago, the University
of Houston, and the University of Kentucky. To ensure the sustainability of the SGCC as an open-source
project, we will carry out several actions, including the standalone software release and maintenance and
its integration into widely used systems. We anticipate the significant impact of this effort in the long
term, which can be assessed by multiple metrics such as download statistics and positive feedback from
users.
| Estado | Finalizado |
|---|---|
| Fecha de inicio/Fecha fin | 8/1/25 → 8/1/25 |
Huella digital
Explore los temas de investigación que se abordan en este proyecto. Estas etiquetas se generan con base en las adjudicaciones/concesiones subyacentes. Juntos, forma una huella digital única.