Abstract
Todays exa-scale scientific applications or advanced instruments are producing vast volumes of data, which need to be shared/transferred through the network/devices with relatively low bandwidth (e.g., data sharing on WAN or transferring from edge devices to supercomputers). Lossy compression is one of the candidate strategies to address the big data issue. However, little work was done to make it resilient against silent errors, which may happen during the stage of compression or data transferring. In this paper, we propose a resilient error-bounded lossy compressor based on the SZ compression framework. Specifically, we design a new independentblock-wise model that decomposes the entire dataset into many independent sub-blocks to compress then, we design and implement a series of error detection/correction strategies elaboratively for each stage of SZ. Our method is arguably the first algorithmbased fault tolerance (ABFT) solution for lossy compression. Our proposed solution incurs negligible execution overhead in the faultfree situation. Upon soft errors happening, it ensures decompressed data strictly bounded within users requirement with a very limited degradation of compression ratio and low overhead.
Original language | English |
---|---|
Title of host publication | Proceedings of SC 2021 |
Subtitle of host publication | The International Conference for High Performance Computing, Networking, Storage and Analysis: Science and Beyond |
ISBN (Electronic) | 9781450384421 |
DOIs | |
State | Published - Nov 14 2021 |
Event | 33rd International Conference for High Performance Computing, Networking, Storage and Analysis: Science and Beyond, SC 2021 - Virtual, Online, United States Duration: Nov 14 2021 → Nov 19 2021 |
Publication series
Name | International Conference for High Performance Computing, Networking, Storage and Analysis, SC |
---|---|
ISSN (Print) | 2167-4329 |
ISSN (Electronic) | 2167-4337 |
Conference
Conference | 33rd International Conference for High Performance Computing, Networking, Storage and Analysis: Science and Beyond, SC 2021 |
---|---|
Country/Territory | United States |
City | Virtual, Online |
Period | 11/14/21 → 11/19/21 |
Bibliographical note
Publisher Copyright:© 2021 IEEE Computer Society. All rights reserved.
Keywords
- Algorithm Based Fault Tolerance
- Data transfer
- Lossy compression
ASJC Scopus subject areas
- Computer Networks and Communications
- Computer Science Applications
- Hardware and Architecture
- Software