Despite the remarkable progresses made in deep learning based depth map super-resolution (DSR), how to tackle real-world degradation in low-resolution (LR) depth maps remains a major challenge. Existing DSR model is generally trained and tested on synthetic dataset, which is very different from what would get from a real depth sensor. In this paper, we argue that DSR models trained under this setting are restrictive and not effective in dealing with realworld DSR tasks. We make two contributions in tackling real-world degradation of different depth sensors. First, we propose to classify the generation of LR depth maps into two types: non-linear downsampling with noise and interval downsampling, for which DSR models are learned correspondingly. Second, we propose a new framework for real-world DSR, which consists of four modules : 1) An iterative residual learning module with deep supervision to learn effective high-frequency components of depth maps in a coarse-to-fine manner; 2) A channel attention strategy to enhance channels with abundant high-frequency components; 3) A multi-stage fusion module to effectively reexploit the results in the coarse-to-fine process; and 4) A depth refinement module to improve the depth map by TGV regularization and input loss. Extensive experiments on benchmarking datasets demonstrate the superiority of our method over current state-of-the-art DSR methods.
|Number of pages||10|
|Journal||Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition|
|State||Published - 2020|
|Event||2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 - Virtual, Online, United States|
Duration: Jun 14 2020 → Jun 19 2020
Bibliographical noteFunding Information:
In this paper, we have proposed an effective depth map super-resolution method that accounts for real-world degradation processes of different types of physical depth sensors. We have envisaged the employment of our new method to super-resolve depth maps captured by commodity depth sensors such as Microsoft Kinect and Lidar. We analyze two different LR depth map simulation schemes: non-linear downsampling and interval downsampling. Furthermore, we have devised a channel attention based iterative residual learning framework to address real world depth map super-resolution. Extensive experiments across different benchmarks have demonstrated the superiority of our proposed approach over the state-of-the-art. Acknowledgment. The work is supported by Baidu Research. Yuchao Dai’s research is supported in part by the National Key Research and Development Program of China under Grant 2018AAA0102803 and Natural Science Foundation of China grants (61871325, 61420106007, 61671387), and Hongdong Li’s research is supported in part by the ARC Centre of Excellence for Robotics Vision (CE140100016) AND ARC-Discovery (DP 190102261), ARC-LIEF (190100080) grants. The authors of ANU gratefully acknowledge the GPUs donated by NVIDIA Corporation. We thank all anonymous reviewers and ACs for their constructive comments.
© 2020 IEEE.
ASJC Scopus subject areas
- Computer Vision and Pattern Recognition