Abstract
Currently, in Autonomous Driving (AD), most of the 3D object detection frameworks (either anchor- or anchor-freebased) consider the detection as a Bounding Box (BBox) regression problem. However, this compact representation is not sufficient to explore all the information of the objects. To tackle this problem, we propose a simple but practical detection framework to jointly predict the 3D BBox and instance segmentation. For instance segmentation, we propose a Spatial Embeddings (SEs) strategy to assemble all foreground points into their corresponding object centers. Base on the SE results, the object proposals can be generated based on a simple clustering strategy. For each cluster, only one proposal is generated. Therefore, the Non-Maximum Suppression (NMS) process is no longer needed here. Finally, with our proposed instance-aware ROI pooling, the BBox is refined by a second-stage network. Experimental results on the public KITTI dataset show that the proposed SEs can significantly improve the instance segmentation results compared with other feature embedding-based method. Meanwhile, it also outperforms most of the 3D object detectors on the KITTI testing benchmark.
Original language | English |
---|---|
Article number | 9156967 |
Pages (from-to) | 1836-1846 |
Number of pages | 11 |
Journal | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
DOIs | |
State | Published - 2020 |
Event | 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 - Virtual, Online, United States Duration: Jun 14 2020 → Jun 19 2020 |
Bibliographical note
Funding Information:Yuchao Dai's research was supported in part by the Natural Science Foundation of China grants (61871325, 61420106007, 61671387), and the National Key Research and Development Program of China under Grant 2018AAA0102803. Hongdong Li's research is funded in part by the ARC Centre of Excellence for Robotics Vision (CE140100016), ARC-Discovery (DP 190102261) and ARC-LIEF (190100080) grants, as well as a research grant from Baidu Research, Robotics and Autonomous Driving Laboratory (RAL). The authors from ANU gratefully acknowledge the GPUs donated by NVIDIA Corporation. We thank all anonymous reviewers and ACs for their constructive comments.
Funding Information:
Acknowledgement Yuchao Dai’s research was supported in part by the Natural Science Foundation of China grants (61871325, 61420106007, 61671387), and the National Key Research and Development Program of China under Grant 2018AAA0102803. Hongdong Li’s research is funded in part by the ARC Centre of Excellence for Robotics Vision (CE140100016), ARC-Discovery (DP 190102261) and ARC-LIEF (190100080) grants, as well as a research grant from Baidu Research, Robotics and Autonomous Driving Laboratory (RAL). The authors from ANU gratefully acknowledge the GPUs donated by NVIDIA Corporation. We thank all anonymous reviewers and ACs for their constructive comments.
Publisher Copyright:
© 2020 IEEE
ASJC Scopus subject areas
- Software
- Computer Vision and Pattern Recognition