TY - GEN
T1 - Semantic segmentation of urban scenes using dense depth maps
AU - Zhang, Chenxi
AU - Wang, Liang
AU - Yang, Ruigang
PY - 2010
Y1 - 2010
N2 - In this paper we present a framework for semantic scene parsing and object recognition based on dense depth maps. Five view-independent 3D features that vary with object class are extracted from dense depth maps at a superpixel level for training a classifier using randomized decision forest technique. Our formulation integrates multiple features in a Markov Random Field (MRF) framework to segment and recognize different object classes in query street scene images. We evaluate our method both quantitatively and qualitatively on the challenging Cambridge-driving Labeled Video Database (CamVid). The result shows that only using dense depth information, we can achieve overall better accurate segmentation and recognition than that from sparse 3D features or appearance, or even the combination of sparse 3D features and appearance, advancing state-of-the-art performance. Furthermore, by aligning 3D dense depth based features into a unified coordinate frame, our algorithm can handle the special case of view changes between training and testing scenarios. Preliminary evaluation in cross training and testing shows promising results.
AB - In this paper we present a framework for semantic scene parsing and object recognition based on dense depth maps. Five view-independent 3D features that vary with object class are extracted from dense depth maps at a superpixel level for training a classifier using randomized decision forest technique. Our formulation integrates multiple features in a Markov Random Field (MRF) framework to segment and recognize different object classes in query street scene images. We evaluate our method both quantitatively and qualitatively on the challenging Cambridge-driving Labeled Video Database (CamVid). The result shows that only using dense depth information, we can achieve overall better accurate segmentation and recognition than that from sparse 3D features or appearance, or even the combination of sparse 3D features and appearance, advancing state-of-the-art performance. Furthermore, by aligning 3D dense depth based features into a unified coordinate frame, our algorithm can handle the special case of view changes between training and testing scenarios. Preliminary evaluation in cross training and testing shows promising results.
UR - http://www.scopus.com/inward/record.url?scp=78149338300&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78149338300&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-15561-1_51
DO - 10.1007/978-3-642-15561-1_51
M3 - Conference contribution
AN - SCOPUS:78149338300
SN - 364215560X
SN - 9783642155604
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 708
EP - 721
BT - Computer Vision, ECCV 2010 - 11th European Conference on Computer Vision, Proceedings
T2 - 11th European Conference on Computer Vision, ECCV 2010
Y2 - 10 September 2010 through 11 September 2010
ER -