Ir directamente a la navegación principal Ir directamente a la búsqueda Ir directamente al contenido principal

Collaborative Research: Elements: SGCC: An Efficient GPU-Oriented Data Reduction Cyberinfrastructure for Scientific Data Analysis

  • Tian, Jiannan (PI)

Detalles del proyecto

Description

Ever-growing volumes and rates of scientific data generated for post-hoc analysis on heterogeneous platforms present severe memory- and data-movement-centric challenges for many science domains, calling for efficient data-reduction technologies. With the GPU dominance in heterogeneous computing nowadays, however, the ecosystem of GPU-based scientific data compressors is still maturing and needs to address the following gaps: 1) the vendor (CUDA) lock-in; 2) the lack of integration of in-depth scientific workflow; 3) the lack of user-friendly interface and out-of-box solution. To promote the ease of use of GPU-based scientific data compressors, we will work with science partners from the domains of cosmology and material science, creating an ecosystem called SGCC (Scientific GPU Compression Cyberinfrastructure). SGCC is intended for automating the end-to-end workflow, with GPU compression inlined with the data staging, in situ data processing, and data-driven post hoc analysis. Eventually, SGCC will provide users with a user-friendly, high-performance GPU-accelerated data reduction for all major GPU-equipped supercomputing platforms. We will build SGCC, a user-friendly cyberinfrastructure for the end-to-end solution of GPU accelerated data compression for scientific applications, by porting, adapting, optimizing, and extending multiple existing capabilities, including cuSZ (the GPU version of the state-of-theart error-bounded lossy compression framework—SZ), standalone GPU-based lossless encoders, QCAT (state-of-the-art CPU- based compression quality assessment toolset), Kokkos (the multi-GPU-backend performance-portability ecosystem), LibPressio (the unified programming interface of various scientific compressors), and HDF5 I/O library. To craft SGCC, we will combine three thrusts: (1) Promote backend capability: Ensure efficient analysis workflow integrated with GPU accelerated scientific data compressors, providing adequate data compression schemes for diverse data formats (Binary, HDF5, etc.) and enabling quality- oriented compression configuration autotuning; (2) Promote frontend capability: Improve the user interface in the GPU-accelerated datareduction ecosystem by providing high-level language bindings (e.g., Python), command-line tools (CLI), and a graphical user interface (GUI) integrated with visualization functionality, such as an embedded view in Jupyter notebook, which can advance user productivity. (3) Promote portability: Enable state-of-the-art GPU-accelerated scientific data compressors on emerging heterogeneous computing platforms, such as NVIDIA, AMD, and Intel. SGCC will dedicate the development efforts and the use of GPU-accelerated lossy compression for sciences and GPU-platform operations, including cosmology sciences and material science (X-ray). Through collaborations with multiple scientific and operational partners, we will jointly optimize the performance of SGCC and evaluate its implementation. SGCC will mitigate the data challenges on emerging supercomputing systems, improve posthoc analysis efficiency for users, and accelerate scientific discovery by increasing the scalability due to data reduction This project will continuously contribute to the education and training of graduate students by enhancing the quality of computing- related curricula in heterogeneous scientific computing, scientific data management, and compression/visualization through outreach activities at the University of Chicago, the University of Houston, and the University of Kentucky. To ensure the sustainability of the SGCC as an open-source project, we will carry out several actions, including the standalone software release and maintenance and its integration into widely used systems. We anticipate the significant impact of this effort in the long term, which can be assessed by multiple metrics such as download statistics and positive feedback from users.
EstadoFinalizado
Fecha de inicio/Fecha fin8/1/258/1/25

Huella digital

Explore los temas de investigación que se abordan en este proyecto. Estas etiquetas se generan con base en las adjudicaciones/concesiones subyacentes. Juntos, forma una huella digital única.