3D part guided image editing for fine-grained object understanding

Zongdai Liu, Feixiang Lu, Peng Wang, Hui Miao, Liangjun Zhang, Ruigang Yang, Bin Zhou

Research output: Contribution to journalConference articlepeer-review

7 Scopus citations


Holistically understanding an object with its 3D movable parts is essential for visual models of a robot to interact with the world. For example, only by understanding many possible part dynamics of other vehicles (e.g., door or trunk opening, taillight blinking for changing lane), a self-driving vehicle can be success in dealing with emergency cases. However, existing visual models tackle rarely on these situations, but focus on bounding box detection. In this paper, we fill this important missing piece in autonomous driving by solving two critical issues. First, for dealing with data scarcity, we propose an effective training data generation process by fitting a 3D car model with dynamic parts to cars in real images. This allows us to directly edit the real images using the aligned 3D parts, yielding effective training data for learning robust deep neural networks (DNNs). Secondly, to benchmark the quality of 3D part understanding, we collected a large dataset in real driving scenario with cars in uncommon states (CUS), i.e. with door or trunk opened etc., which demonstrates that our trained network with edited images largely outperforms other baselines in terms of 2D detection and instance segmentation accuracy.

Original languageEnglish
Article number9157345
Pages (from-to)11333-11342
Number of pages10
JournalProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
StatePublished - 2020
Event2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 - Virtual, Online, United States
Duration: Jun 14 2020Jun 19 2020

Bibliographical note

Publisher Copyright:
© 2020 IEEE

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition


Dive into the research topics of '3D part guided image editing for fine-grained object understanding'. Together they form a unique fingerprint.

Cite this