3D point cloud semantic segmentation: state of the art and challenges

WANG Yixian; HU Yufan; KONG Qingqun; ZENG Hui; ZHANG Lixin; FAN Bin

doi:10.13374/j.issn2095-9389.2022.12.17.004

Volume 45 Issue 10

Oct. 2023

Turn off MathJax

Article Contents

Article Navigation > Chinese Journal of Engineering > 2023 > 45(10): 1653-1665

WANG Yixian, HU Yufan, KONG Qingqun, ZENG Hui, ZHANG Lixin, FAN Bin. 3D point cloud semantic segmentation: state of the art and challenges[J]. Chinese Journal of Engineering, 2023, 45(10): 1653-1665. doi: 10.13374/j.issn2095-9389.2022.12.17.004

Citation:

WANG Yixian, HU Yufan, KONG Qingqun, ZENG Hui, ZHANG Lixin, FAN Bin. 3D point cloud semantic segmentation: state of the art and challenges[J]. Chinese Journal of Engineering, 2023, 45(10): 1653-1665. doi: 10.13374/j.issn2095-9389.2022.12.17.004

Citation:

PDF( 1190 KB)

3D point cloud semantic segmentation: state of the art and challenges

doi: 10.13374/j.issn2095-9389.2022.12.17.004

WANG Yixian¹,
HU Yufan¹,
KONG Qingqun^{2, 3},
ZENG Hui¹,
ZHANG Lixin¹,
FAN Bin^{1
,
,}

1.
School of Intelligence Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
2.
Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
3.
University of Chinese Academy of Sciences, Beijing 100049, China

More Information

Corresponding author: E-mail: bin.fan@ieee.org
Received Date: 2022-12-17
Available Online: 2023-02-27
Publish Date: 2023-10-25

Abstract

Abstract

Decrease in the cost of acquiring 3D point cloud data coupled with the rapid advancements in GPU computing power have resulted in an increased demand for 3D point cloud semantic segmentation in numerous 3D visual applications, including but not limited to autonomous driving, industrial control, and MR/XR, which further advances the development of deep learning methods in 3D point cloud semantic segmentation. Recently, many novel deep learning network architectures, such as RandLA-Net and Point Transformer, have been proposed and have achieved notable improvements in semantic segmentation accuracy while decreasing the computational load. However, previous research on 3D point cloud semantic segmentation methods has focused primarily on relatively early works, whose approaches have been gradually abandoned over the years and cannot accurately reflect the current research status. Moreover, the existing methods have been categorized based on their input data types, making it difficult to compare the segmentation performance of different techniques and not providing a comprehensive view of the relationship between methods using different network architectures. Therefore, this paper reviews the mainstream 3D semantic segmentation methods developed in the last three years using different deep learning network architectures and is organized into three levels. First, the two principal 3D point cloud data acquisition methods, including their customary datasets and metrics to evaluate model performance, are introduced. Second, a systematic review of 3D semantic segmentation methods based on different network architectures is organized, followed by a statistical analysis of the evaluation of performance between different models on two 3D segmentation datasets—S3DIS and ScanNet. The analysis of model performance on these two commonly used datasets includes model structure relevance, strengths, and limitations. Finally, an insightful discussion of the remaining methodological and application challenges and potential research directions is provided. This paper offers an extensive overview of the recent three-year research progress in 3D point cloud semantic segmentation and summarizes various network architecture pipelines, elucidates their fundamental operations, compares the model performance across multiple architectures, discusses their notable strengths and limitations, most importantly, concludes the current challenges and promising research directions for future investigations. Furthermore, this paper enables researchers to effortlessly identify the relevant research and research hotspots among different 3D point cloud semantic segmentation methods based on the analyses presented and aims to update the reviews on 3D point cloud semantic segmentation methods with a better viewpoint and highlight key properties and contributions of proposed methods, providing promising research directions for the main challenges.
- 3D vision,
- point cloud,
- semantic segmentation,
- deep learning,
- network framework

FullText(HTML)

References(74)

References

[1]	Riemenschneider H, Bódis-Szomorú A, Weissenberg J, et al. Learning where to classify in multi-view semantic segmentation // European Conference on Computer Vision. Zurich, 2014: 516
[2]	Armeni I, Sener O, Zamir A R, et al. 3D semantic parsing of large-scale indoor spaces // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, 2016: 1534
[3]	Wu W X, Qi Z A, Li F X. PointConv: deep convolutional networks on 3D point clouds // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, 2020: 9613
[4]	Wang Y, Sun Y B, Liu Z W, et al. Dynamic graph CNN for learning on point clouds. ACM Trans Graph, 2019, 38(5): 1
[5]	Zhao H S, Jiang L, Jia J Y, et al. Point transformer // 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, 2021: 16259
[6]	Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786): 504 doi: 10.1126/science.1127647
[7]	Guo Y L, Wang H Y, Hu Q Y, et al. Deep learning for 3D point clouds: A survey. IEEE Trans Pattern Anal Mach Intell, 2020, 43(12): 4338
[8]	Xie Y X, Tian J J, Zhu X X. Linking points with labels in 3D: A review of point cloud semantic segmentation. IEEE Geosci Remote Sens Mag, 2020, 8(4): 38 doi: 10.1109/MGRS.2019.2937630
[9]	He Y, Yu H S, Liu X Y, et al. Deep learning based 3D segmentation: A survey [J/OL]. arXiv Preprint (2021-3-10) [2022-12-17]. https://arxiv.org/abs/2103.05423&p;shy;
[10]	Lahoud J, Cao J L, Khan F S, H, et al. 3D Vision with Transformers: A Survey [J/OL]. arXiv preprint (2022-8-8) [2022-12-17]. https://arxiv.org/abs/2208.04309
[11]	Lu D N, Xie Q, Wei M Q, et al. Transformers in 3D point clouds: A survey [J/OL]. arXiv preprint (2017-5-24) [2022-12-17]. https://arxiv.org/abs/2205.07417
[12]	Zeng J H, Wang D C, Chen P. A survey on transformers for point cloud processing: An updated overview. IEEE Access, 2022, 10: 86510 doi: 10.1109/ACCESS.2022.3198999
[13]	Gao B, Pan Y C, Li C K, et al. Are we hungry for 3D LiDAR data for semantic segmentation? A survey of datasets and methods. IEEE Trans Intell Transp Syst, 2021, 23(7): 6063
[14]	Hackel T, Savinov N, Ladicky L, et al. Semantic3D. net: A new large-scale point cloud classification benchmark [J/OL]. arXiv preprint (2017-5-24) [2022-12-17]. https://arxiv.org/abs/1704.03847
[15]	Behley J, Garbade M, Milioto A, et al. SemanticKITTI: A dataset for semantic scene understanding of LiDAR sequences // 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, 2019: 9296
[16]	Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite // 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, 2012: 3354
[17]	Dai A, Chang A X, Savva M, et al. ScanNet: richly-annotated 3D reconstructions of indoor scenes // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, 2017: 5828
[18]	Liang Z D, Yang M, Deng L Y, et al. Hierarchical depthwise graph convolutional neural network for 3D semantic segmentation of point clouds // 2019 International Conference on Robotics and Automation (ICRA). Montreal, 2019: 8152
[19]	Li Y, Ma L F, Zhong Z L, et al. TGNet: Geometric graph CNN on 3-D point cloud segmentation. IEEE Trans Geosci Remote Sens, 2019, 58(5): 3588
[20]	Choy C, Gwak J Y, Savarese S. 4d spatio-temporal convnets: Minkowski convolutional neural networks // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, 2019: 3075
[21]	Wang L, Huang Y C, Hou Y L, et al. Graph attention convolution for point cloud semantic segmentation // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, 2019: 10296
[22]	Yang J C, Zhang Q, Ni B B, et al. Modeling point clouds with self-attention and gumbel subset sampling // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: 2019: 3323
[23]	Zhao H S, Jiang L, Fu C W, et al. PointWeb: enhancing local neighborhood features for point cloud processing // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, 2019: 5565
[24]	Komarichev A, Zhong Z C, Hua J. A-CNN: Annularly convolutional neural networks on point clouds // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, 2019: 7421
[25]	Meng H Y, Gao L, Lai Y K, et al. VV-net: Voxel VAE net with group convolutions for point cloud segmentation // 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, 2019: 8500
[26]	Li G H, Müller M, Thabet A, et al. DeepGCNs: can GCNs go as deep as CNNs? // 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, 2019: 9267
[27]	Thomas H, Qi C R, Deschaud J E, et al. KPConv: flexible and deformable convolution for point clouds // 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, 2019: 6411
[28]	Milioto A, Vizzo I, Behley J, et al. RangeNet++: fast and accurate LiDAR semantic segmentation // 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Macau, 2019: 4213
[29]	Ma Y N, Guo Y L, Liu H, et al. Global context reasoning for semantic segmentation of 3D point clouds // 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). Snowmass, 2020: 2931
[30]	Shi H Y, Lin G S, Wang H, et al. SpSequenceNet: semantic segmentation network on 4D point clouds // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, 2020: 4574
[31]	Hu Q Y, Yang B, Xie L H, et al. Randla-net: Efficient semantic segmentation of large-scale point clouds // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, 2020: 11108
[32]	Lei H, Akhtar N, Mian A. SegGCN: efficient 3D point cloud segmentation with fuzzy spherical kernel // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, 2020: 11611
[33]	Xu C F, Wu B C, Wang Z N, et al. SqueezeSegV3: Spatially-adaptive convolution for efficient point-cloud segmentation // European Conference on Computer Vision. Glasgow, 2020: 1
[34]	Xie Z Y, Chen J Z, Peng B. Point clouds learning with attention-based graph convolution networks. Neurocomputing, 2020, 402: 245 doi: 10.1016/j.neucom.2020.03.086
[35]	Lei H, Akhtar N, Mian A. Spherical kernel for efficient graph convolution on 3D point clouds. IEEE Trans Pattern Anal Mach Intell, 2020, 43(10): 3664
[36]	Wen X, Han Z Z, Youk G, et al. CF-SIS: Semantic-instance segmentation of 3D point clouds by context fusion with self-attention // Proceedings of the 28th ACM International Conference on Multimedia. Seattle, 2020: 1661
[37]	Feng M T, Zhang L, Lin X F, et al. Point attention network for semantic segmentation of 3D point clouds. Pattern Recognit, 2020, 107: 107446 doi: 10.1016/j.patcog.2020.107446
[38]	Zhang G G, Ma Q H, Jiao L C, et al. AttAN: Attention adversarial networks for 3D point cloud semantic segmentation // Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. Yokohama, 2020: 789
[39]	Huang H, Fang Y. Adaptive wavelet transformer network for 3D shape representation learning // International Conference on Learning Representations. Hefei, 2022: 1
[40]	Xu M T, Ding R Y, Zhao H S, et al. PAConv: position adaptive convolution with dynamic kernel assembling on point clouds // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, 2021: 3173
[41]	Fan H H, Yang Y, Kankanhalli M. Point 4D transformer networks for spatio-temporal modeling in point cloud videos // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, 2021: 14204
[42]	Guo M H, Cai J X, Liu Z N, et al. PCT: Point cloud transformer. Comp Visual Media, 2021, 7(2): 187 doi: 10.1007/s41095-021-0229-5
[43]	Zhang C, Wan H C, Shen X Y, et al. PVT: Point-voxel transformer for point cloud learning [J/OL]. arXiv preprint (2022-5-25) [2022-12-17].https://arxiv.org/abs/2108.06076
[44]	Wan J, Xie Z, Xu Y Y, et al. DGANet: A dilated graph attention-based network for local feature extraction on 3D point clouds. Remote Sens, 2021, 13(17): 3484 doi: 10.3390/rs13173484
[45]	Wei Y M, Liu H, Xie T T, et al. Spatial-temporal transformer for 3D point cloud sequences // 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Waikoloa, 2022: 1171
[46]	Gao Y B, Liu X B, Li J, et al. LFT-net: Local feature transformer network for point clouds analysis. IEEE Trans Intell Transp Syst, 2023, 24(2): 2158
[47]	Park C, Jeong Y, Cho M, et al. Fast point transformer // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans 2022: 16949
[48]	Lai X, Liu J H, Jiang L, et al. Stratified transformer for 3D point cloud segmentation // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, 2022: 8500
[49]	Xu S J, Wan R, Ye M S, et al. Sparse cross-scale attention network for efficient LiDAR panoptic segmentation // Proceedings of the AAAI Conference on Artificial Intelligence. Online, 2022: 2920
[50]	Yu X M, Tang L L, Rao Y M, et al. Point-BERT: Pre-training 3D point cloud transformers with masked point modeling // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, 2022: 19313
[51]	Fu K X, YuanM Z, Wang M N. Point-McBert: A Multi-choice self-supervised framework for point cloud pre-training [J/OL]. arXiv preprint (2022-8-15) [2022-12-17]. https://arxiv.org/abs/2207.13226
[52]	Zeng Z Y, Xu Y Y, Xie Z, et al. RG-GCN: A random graph based on graph convolution network for point cloud semantic segmentation. Remote Sens, 2022, 14: 4055 doi: 10.3390/rs14164055
[53]	Wu Y X, Liao K L, Chen J T, et al. D-former: A u-shaped dilated transformer for 3d medical image segmentation. Neural Comput Appl, 2022, 35: 1931
[54]	Qian G C, Zhang X D, Hamdi A, et al. Improving standard transformer models for 3D point cloud understanding with image pretraining [J/OL]. arXiv preprint (2022-11-22) [2022-12-17]. https://arxiv.org/abs/2208.12259
[55]	Yan X, Gao J T, Zheng C D, et al. 2DPASS: 2D priors assisted semantic segmentation on LiDAR point clouds // European Conference on Computer Vision. Tel Aviv, 2022: 677
[56]	Wu X Y, Lao Y X, Jiang L, et al. Point transformer V2: Grouped vector attention and partition-based pooling [J/OL]. arXiv preprint (2022-10-11) [2022-12-17]. https://arxiv.org/abs/2210.05666
[57]	Mousavian A, Pirsiavash H, Košecká J. Joint semantic segmentation and depth estimation with deep convolutional networks // 2016 Fourth International Conference on 3D Vision (3DV). Stanford, 2016: 611
[58]	Charles R Q, Hao S, Mo K C, et al. PointNet: deep learning on point sets for 3D classification and segmentation // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, 2017: 652
[59]	Wu B C, Zhou X Y, Zhao S C, et al. SqueezeSegV2: Improved model structure and unsupervised domain adaptation for road-object segmentation from a LiDAR point cloud // 2019 International Conference on Robotics and Automation (ICRA). New York, 2019: 4376
[60]	Wu B C, Wan A, Yue X Y, et al. SqueezeSeg: convolutional neural nets with recurrent CRF for real-time road-object segmentation from 3D LiDAR point cloud // 2018 IEEE International Conference on Robotics and Automation (ICRA). Brisbane, 2018: 1887
[61]	Xu Q G, Sun X D, Wu C Y, et al. Grid-GCN for fast and scalable point cloud learning // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, 2020: 5661
[62]	Lei H, Akhtar N, Mian A. Octree guided CNN with spherical kernels for 3D point clouds // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, 2019: 9631
[63]	Liang Z D, Yang M, Li H, et al. 3D instance embedding learning with a structure-aware loss function for point cloud segmentation. IEEE Robotics Autom Lett, 2020, 5(3): 4915 doi: 10.1109/LRA.2020.3004802
[64]	Qi C R, Yi L, Su H, et al. PointNet++: Deep hierarchical feature learning on point sets in a metric space // Advances in Neural Information Processing Systems. Long Beach, 2017: 1
[65]	劉建偉, 劉俊文, 羅雄麟. 深度學習中注意力機制研究進展. 工程科學學報, 2021, 43(11):1499 Liu J W, Liu J W, Luo X L. Research progress in attention mechanism in deep learning. Chin J Eng, 2021, 43(11): 1499
[66]	Guo M H, Xu T X, Liu J J, et al. Attention mechanisms in computer vision: A survey. Comput Vis Media, 2022, 8(3): 331 doi: 10.1007/s41095-022-0271-y
[67]	Thyagharajan A, Ummenhofer B, Laddha P, et al. Segment-fusion: Hierarchical context fusion for robust 3D semantic segmentation // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, 2022: 1236
[68]	Li R H, Li X Z, Heng P A, et al. PointAugment: an auto-augmentation framework for point cloud classification // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, 2020: 6378
[69]	Xiao A R, Huang J X, Guan D Y, et al. Unsupervised representation learning for point clouds: A survey [J/OL]. arXiv preprint (2022-6-5) [2022-12-17]. https://arxiv.org/abs/2202.13589
[70]	Liu M H, Zhou Y, Qi C R, et al. LESS: Label-efficient semantic segmentation for LiDAR point clouds // European Conference on Computer Vision. Tel Aviv, 2022: 70
[71]	Jhaldiyal A, Chaudhary N. Semantic segmentation of 3D LiDAR data using deep learning: A review of projection-based methods. Appl Intell, 2023, 53(6): 6844 doi: 10.1007/s10489-022-03930-5
[72]	Guo M H, Lu C Z, Hou Q B, et al. SegNeXt: Rethinking convolutional attention design for semantic segmentation [J/OL]. arXiv preprint (2022-9-18) [2023-12-17]. https://arxiv.org/abs/2209.08575
[73]	Qian G C, Li Y C, Peng H W, et al. PointNeXt: Revisiting PointNet++ with improved training and scaling strategies [J/OL]. arXiv preprint (2022-10-12) [2022-12-17]. https://arxiv.org/abs/2206.04670
[74]	Xie X, Bai L, Huang X M. Real-time LiDAR point cloud semantic segmentation for autonomous driving. Electronics, 2021, 11(1): 11 doi: 10.3390/electronics11010011