Optimizing Lossy Compression Rate-Distortion from Automatic Online Selection between SZ and ZFP

Dingwen Tao, Sheng Di, Xin Liang, Zizhong Chen, Franck Cappello

Research output: Contribution to journalArticlepeer-review

58 Scopus citations

Abstract

With ever-increasing volumes of scientific data produced by high-performance computing applications, significantly reducing data size is critical because of limited capacity of storage space and potential bottlenecks on I/O or networks in writing/reading or transferring data. SZ and ZFP are two leading BSD licensed open source C/C++ libraries for compressed floating-point arrays that support high throughput read and write random access. However, their performance is not consistent across different data sets and across different fields of some data sets, which raises the need for an automatic online (during compression) selection between SZ and ZFP, with minimal overhead. In this paper, the automatic selection optimizes the rate-distortion, an important statistical quality metric based on the signal-to-noise ratio. To optimize for rate-distortion, we investigate the principles of SZ and ZFP. We then propose an efficient online, low-overhead selection algorithm that predicts the compression quality accurately for two compressors in early processing stages and selects the best-fit compressor for each data field. We implement the selection algorithm into an open-source library, and we evaluate the effectiveness of our proposed solution against plain SZ and ZFP in a parallel environment with 1,024 cores. Evaluation results on three data sets representing about 100 fields show that our selection algorithm improves the compression ratio up to 70 percent with the same level of data distortion because of very accurate selection (around 99 percent) of the bestfit compressor, with little overhead (less than 7 percent in the experiments).

Original languageEnglish
Article number8621017
Pages (from-to)1857-1871
Number of pages15
JournalIEEE Transactions on Parallel and Distributed Systems
Volume30
Issue number8
DOIs
StatePublished - Aug 1 2019

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

Keywords

  • Lossy compression
  • compression ratio
  • high-performance computing
  • rate-distortion
  • scientific data

ASJC Scopus subject areas

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Optimizing Lossy Compression Rate-Distortion from Automatic Online Selection between SZ and ZFP'. Together they form a unique fingerprint.

Cite this