Hostname: page-component-745bb68f8f-lrblm Total loading time: 0 Render date: 2025-01-11T16:49:00.908Z Has data issue: false hasContentIssue false

PLOT: a 3D point cloud object detection network for autonomous driving

Published online by Cambridge University Press:  16 January 2023

Yihuan Zhang
Affiliation:
Intelligent Connected Vehicle Center, Tsinghua Automotive Research Institute, Suzhou, China
Liang Wang
Affiliation:
Intelligent Connected Vehicle Center, Tsinghua Automotive Research Institute, Suzhou, China
Yifan Dai*
Affiliation:
Intelligent Connected Vehicle Center, Tsinghua Automotive Research Institute, Suzhou, China
*
*Corresponding author. E-mail: [email protected]

Abstract

3D object detection using point cloud is an essential task for autonomous driving. With the development of infrastructures, roadside perception can extend the view range of the autonomous vehicles through communication technology. Computation time and power consumption are two main concerns when deploying object detection tasks, and a light-weighted detection model applied in an embedded system is a convenient solution for both roadside and vehicleside. In this study, a 3D Point cLoud Object deTection (PLOT) network is proposed to reduce heavy computing and ensure real-time object detection performance in an embedded system. First, a bird’s eye view representation of the point cloud is calculated using pillar-based encoding method. Then a cross-stage partial network-based backbone and a feature pyramid network-based neck are implemented to generate the high-dimensional feature map. Finally, a multioutput head using a shared convolutional layer is attached to predict classes, bounding boxes, and the orientations of the objects at the same time. Extensive experiments using the Waymo Open Dataset and our own dataset are conducted to demonstrate the accuracy and efficiency of the proposed method.

Type
Research Article
Copyright
© The Author(s), 2023. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bansal, M., Krizhevsky, A. and Ogale, A., Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst ArXiv preprint arXiv: 1812.03079 (2018).CrossRefGoogle Scholar
Wang, D., Devin, C., Cai, Q., Krahenbuhl, P. and Darrell, T., Monocular plan view networks for autonomous driving ArXiv preprint arXiv: 1905.06937 (2019).CrossRefGoogle Scholar
Simonelli, A., Bulo, S. R., Porzi, L., Lopez, M. and Kontschieder, P., “Disentangling Monocular 3D Object Detection,” Proceedings of the IEEE International Conference on Computer Vision (2019) pp. 19911999.Google Scholar
Xu, B. and Chen, Z., “Multi-level Fusion based 3D Object Detection from Monocular Images,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018) pp. 23452353.Google Scholar
Li, P., Chen, X. and Shen, S., “Stereo R-CNN based 3D Object Detection for Autonomous Driving,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019) pp. 76447652.Google Scholar
Qin, Z., Wang, J. and Lu, Y., “Triangulation Learning Network: From Monocular to Stereo 3D Object Detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019) pp. 76157623.Google Scholar
Zhou, Y. and Tuzel, O., “Voxelnet: End-to-End Learning for Point Cloud based 3D Object Detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018) pp. 44904499.Google Scholar
Yan, Y., Mao, Y. and Li, B., “Second: Sparsely embedded convolutional detection,” Sensors 18(10), 33373348 (2018).CrossRefGoogle ScholarPubMed
Leonard, J., How, J., Teller, S., Berger, M., Campbell, S., Fiore, G., Fletcher, L., Frazzoli, E., Huang, A. and Karaman, S., “A perception-driven autonomous urban vehicle,” J. Field Robot. 25(10), 727774 (2008).CrossRefGoogle Scholar
Himmelsbach, M., Mueller, A., Luttel, T. and Wunsche, H., “LIDAR-based 3D Object Perception,” Proceedings of 1st International Workshop on Cognition for Technical Systems (2008) pp. 110.Google Scholar
Meyer, G. P., Laddha, A., Kee, E., Vallespi-Gonzalez, C. and Wellington, C. K., “Lasernet: An Efficient Probabilistic 3d Object Detector for Autonomous Driving,” In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019) pp. 1267712686.Google Scholar
Shi, S., Wang, X. and Li, H., “Pointrcnn: 3D Object Proposal Generation and Detection From Point Cloud,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019) pp. 770779.Google Scholar
Lang, A. H., Vora, S., Caesar, H., Zhou, L., Yang, J. and Beijbom, O., “Pointpillars: Fast Encoders for Object Detection from Point Cloud,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019) pp. 1269712705.Google Scholar
Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X. and Li, H., “PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020) pp. 1052910538.Google Scholar
Qi, C. R., Su, H., Mo, K. and Guibas, L. J., “Pointnet: Deep Learning on Point Sets for 3D Classification and Segmentation,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017) pp. 652660.Google Scholar
Zhou, Y., Sun, P., Zhang, Y., Anguelov, D., Gao, J., Ouyang, T. and Vasudevan, V., “End-to-end Multi-View Fusion for 3D Object Detection in Lidar Point Clouds,” Conference on Robot Learning (2020) pp. 923932.Google Scholar
Wang, C., Liao, H., Wu, Y., Chen, P., Hsieh, J. and Yeh, I., “CSPNet: A New Backbone that Can Enhance Learning Capability of CNN,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2020) pp. 390391.Google Scholar
Lin, T., Dollár, P., Girshick, R., He, K., Hariharan, B. and Belongie, S., “Feature Pyramid Networks for Object Detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017) pp. 21172125.Google Scholar
Arnold, E., Al-Jarrah, O. Y., Dianati, M., Fallah, S., Oxtoby, D. and Mouzakitis, A., “A survey on 3D object detection methods for autonomous driving applications,” IEEE Trans. Intell. Transp. Syst., 37823795 (2019).Google Scholar
Girshick, R., Donahue, J., Darrell, T. and Malik, J., “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014) pp. 580587.Google Scholar
Girshick, R., “Fast R-CNN,” Proceedings of the IEEE International Conference on Computer Vision (2015) pp. 14401448.Google Scholar
Ren, S., He, K., Girshick, R. and Sun, J., “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” Proceedings of the 28th International Conference on Neural Information Processing (2015) pp. 9199.Google Scholar
He, K., Gkioxari, G., Dollár, P. and Girshick, R., “Mask R-CNN,” Proceedings of the 28th International Conference on Neural Information Processing (2017) pp. 29612969.Google Scholar
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H. and Wei, Y., “Deformable Convolutional Networks,” Proceedings of the IEEE International Conference on Computer Vision (2017) pp. 764773.Google Scholar
Cai, Z. and Vasconcelos, N., “Cascade R-CNN: Delving Into High Quality Object Detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018) pp. 61546162.Google Scholar
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. and Berg, A., “SSD: Single shot multibox detector,” Proceedings of the European Conference on Computer Vision (2016) pp. 2137.Google Scholar
Redmon, J., Divvala, S., Girshick, R. and Farhadi, A., “You Only Look Once: Unified, Real-Time Object Detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016) pp. 779788.Google Scholar
Redmon, J. and Farhadi, A., “YOLO9000: Better, Faster, Stronger,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017) pp. 72637271.Google Scholar
Lin, T., Goyal, P., Girshick, R., He, K. and Dollár, P., “Focal Loss for Dense Object Detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017) pp. 29802988.Google Scholar
Engelcke, M., Rao, D., Wang, D., Tong, C. and Posner, I., “Vote3deep: Fast Object Detection in 3D Point Clouds Using Efficient Convolutional Neural Networks,” Proceedings of the IEEE International Conference on Robotics and Automation (2017) pp. 13551361.Google Scholar
Li, B., “3D Fully Convolutional Network for Vehicle Detection in Point Cloud,” Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (2017) pp. 15131518.Google Scholar
Qi, C., Yi, L., Su, H. and Guibas, L., “PointNet++ Deep Hierarchical Feature Learning on Point Sets in a Metric Space,” Proceedings of the 31st International Conference on Neural Information Processing Systems (2017) pp. 51055114.Google Scholar
Wang, Y., Sun, Y., Liu, Z., Sarma, S., Bronstein, M. and Solomon, J., “Dynamic graph CNN for learning on point clouds,” ACM Trans. Graphics (TOG) 38(5), 112 (2019).Google Scholar
Thomas, H., Qi, C., Deschaud, J., Marcotegui, B., Goulette, F. and Guibas, L., “KPConv: Flexible and Deformable Convolution for Point Clouds,” Proceedings of the IEEE/CVF International Conference on Computer Vision (2019) pp. 64116420.Google Scholar
Yang, B., Luo, W. and Urtasun, R., “Pixor: Real-Time 3D Object Detection from Point Clouds,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018) pp. 76527660.Google Scholar
Qi, C., Liu, W., Wu, C., Su, H. and Guibas, L., “Frustum Pointnets for 3D Object Detection From RGB-D Data,” Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (2018) pp. 918927.Google Scholar
Yin, J., Shen, J., Gao, X., Crandall, D. and Yang, R., “Graph neural network and spatiotemporal transformer attention for 3D video object detection from point clouds,” IEEE Trans. Pattern Anal. Mach. Intell., 112 (2021).CrossRefGoogle ScholarPubMed
Yin, J., Shen, J., Guan, C., Zhou, D. and Yang, R., “LiDAR-Based Online 3D Video Object Detection with Graph-Based Message Passing and Spatiotemporal Transformer Attention,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020) pp. 1149511504.Google Scholar
Li, H., Zhao, S., Zhao, W., Zhang, L. and Shen, J., “One-stage anchor-free 3D vehicle detection from LiDAR sensors ,” Sensors 38, 26512663 (2021).CrossRefGoogle Scholar
Meng, Q., Wang, W., Zhou, T., Shen, J., Jia, Y. and Van Gool, L., “Towards a weakly supervised framework for 3D point cloud object detection and annotation,” IEEE Trans. Pattern Anal. Mach. Intell., 112 (2021).CrossRefGoogle Scholar
Meng, Q., Wang, W., Zhou, T., Shen, J., Van Gool, L. and Dai, D., “Weakly Supervised 3D Object Detection From Lidar Point Cloud,” Proceedings of the European Conference on Computer Vision (2020) pp. 515531.Google Scholar
Wang, C., Bochkovskiy, A. and Liao, H., Scaled-YOLOv4: Scaling Cross Stage Partial Network. Proceedings of the IEEE conference on computer vision and pattern recognition (2021) pp. 13029–13038.Google Scholar
Waymo Open Dataset, “https://waymo.com/open,” Accessed 28 April 2021.Google Scholar
Jetson Xavier NX Developer Kit, “https://developer.nvidia.com/jetson_xavier_nx,” Accessed 28 April 2021.Google Scholar
NVIDIA TensorRT, “https://developer.nvidia.com/tensorrt,” Accessed 28 April 2021.Google Scholar
DeepRoute.ai, “https://www.deeproute.ai,” Accessed 07 Mar 2022.Google Scholar
Tsukada, M., Oi, T., Ito, A., Hirata, M. and Esaki, H., “AutoC2X: Open-Source Software to Realize V2X Cooperative Perception Among Autonomous Vehicles,” Proceedings of the IEEE 92nd Vehicular Technology Conference (2020) pp. 16.Google Scholar