Abstract
Today's scientific high-performance computing (HPC) applications are often running on large-scale environments, producing extremely large volumes of data that need to be compressed effectively for efficient storage or data transfer. Error-bounded lossy compression is arguably the most efficient way to this end, because it can get very high compression ratios while controlling the data distortion strictly based on user requirements for compression errors. However, error-bounded lossy compressors may have serious artifact issues in situations with relatively large error bound or high compression ratios, which is highly undesirable to users. In this paper, we compre-hensively characterize the artifacts for multiple state-of-the-art error-bounded lossy compressors (including SZ-1.4, SZ-2.1, SZ-3.0, FPZIP, ZFP, MGARD) and provide an in-depth analysis for the root cause of these artifacts. We summarize the artifact issue into three types and also develop an efficient artifact detection algorithm for each type of artifact. We finally evaluate our artifact detection methods using four scientific datasets, which demonstrates that the proposed methods are able to detect artifact issues under linear time complexity.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics, HiPC 2023 |
| Pages | 117-126 |
| Number of pages | 10 |
| ISBN (Electronic) | 9798350383225 |
| DOIs | |
| State | Published - 2023 |
| Event | 30th Annual IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC 2023 - Goa, India Duration: Dec 18 2023 → Dec 21 2023 |
Publication series
| Name | Proceedings - 2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics, HiPC 2023 |
|---|
Conference
| Conference | 30th Annual IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC 2023 |
|---|---|
| Country/Territory | India |
| City | Goa |
| Period | 12/18/23 → 12/21/23 |
Bibliographical note
Publisher Copyright:© 2023 IEEE.
Funding
This research was supported by the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations - the Office of Science and the National Nuclear Security Administration, responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, to support the nation's exascale computing imperative. The material was supported by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research (ASCR), under contract DEAC02-06CH11357, and supported by the National Science Foundation under Grant OAC-2003709, OAC-2104023, and OAC-2330367. We acknowledge the computing resources provided by the Center for Computational Science of the University of Kentucky.
| Funders | Funder number |
|---|---|
| National Nuclear Security Administration | |
| University of Kentucky | |
| U.S. Department of Energy EPSCoR | |
| Office of Science Programs | |
| National Science Foundation Arctic Social Science Program | OAC-2330367, OAC-2003709, OAC-2104023 |
| Advanced Scientific Computing Research | DEAC02-06CH11357 |
Keywords
- High-performance computing
- compression artifacts
- lossy compression
- scientific data
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Networks and Communications
- Computer Science Applications
- Hardware and Architecture
- Information Systems
- Information Systems and Management