面向抓取檢測的位姿估計數據集自動采集標注系統

陳鵬; 白勇; 孫翰翔

doi:10.13374/j.issn2095-9389.2023.09.28.001

摘要: 機器人抓取在物流分揀、自動裝配和醫療手術等領域中具有廣泛的應用. 抓取檢測是機器人抓取中的重要步驟之一，隨著三維傳感器的成本逐漸降低，抓取檢測任務中越來越多地使用深度相機采集彩色圖像和深度圖像對(RGB-D)，并采用基于位姿估計的方法實現機器人抓取. 然而，目前已經公開的基于RGB-D圖像的位姿估計數據集，大多需要借助價格昂貴的三維激光掃描儀獲得目標物體的三維模型，而且標注過程依賴人工操作，費時費力，不利于大規模數據集的制作. 為此，本文設計并實現了一個面向位姿估計的數據集自動采集標注系統. 該系統無需使用三維激光掃描儀，只通過采集、分析由深度相機獲得的RGB-D圖像序列即可重建出目標物體的三維模型，并自動標注目標物體的位姿信息，生成二維圖像中的分割掩碼. 實驗中，使用該系統制作了包含84個物體、8400張RGB-D圖像的位姿估計數據集，并將自動標注數據與手動標注數據進行了對比，發現兩者分割掩碼重合率可以達到98%，并且自動標注的位姿信息能夠使模型點云與場景點云的對齊率達到100%，充分說明了所提系統自動標注結果的準確性與可靠性.

Abstract: Robotic grasping has extensive applications in fields such as logistics sorting, automated assembly, and medical surgery. Grasping detection is an important step in robotic grasping. Recently, with the decrease in their costs, depth cameras have been gradually applied for grasping detection, which has promoted the application of pose estimation-based methods for robotic grasping. However, most publicly available RGB-D image-based pose estimation datasets rely on equipment such as expensive 3D laser scanners to obtain 3D models of target objects. Meanwhile, the annotation process relies heavily on manual operation, which is time-consuming, labor-intensive, and unfavorable for the creation of large-scale datasets. To address these issues, this study implements a dataset automatic acquisition and annotation system aimed at developing RGB-D image-based pose estimation methods for robotic grasping. The proposed system deploys easily and does not require an expensive 3D laser scanner. RGB-D image sequences are obtained only by an off-the-shelf depth camera, and the system can automatically acquire the reconstructed 3D model of the target object, annotated pose information, and 2D image segmentation masks. During the process of developing the automatic annotation algorithm for the proposed system, a novel minimum spanning tree-based normal propagation method is proposed to guarantee that consistent normal directions can be acquired so that deformations or tearing on the reconstructed 3D surface caused by inconsistent normal directions can be avoided. During the experiments, the proposed system created a pose estimation dataset containing 84 objects with 8400 RGB-D images. 3D models, image segmentation masks, and 6D poses were annotated by the system in every RGB-D image for each object. To evaluate the accuracy of the annotated segmentation masks, the annotated segmentation masks and the corresponding manually labeled results were compared. Furthermore, the accuracy of the annotation results was also assessed from the performance of an instance segmentation network trained by the annotated image masks. To evaluate the accuracy of the annotated poses, a point cloud registration mission was launched to align the model point cloud and the scene point cloud using the annotated pose parameters. Furthermore, a category-level pose estimation network was trained using the annotated pose parameters, and its performance can directly reflect the accuracy of the annotation results. The experimental results show that the overlapped area between the annotated mask and the manually labeled mask is greater than 98%. Additionally, a 100% alignment rate can be achieved, meaning that the model point cloud can be aligned to any scene point cloud through the corresponding annotated pose parameters. These results demonstrate that the designed and implemented system in this paper can be used to sufficiently create a high-quality dataset for developing real pose estimation-related solutions. A solid data foundation can be provided on the basis of the proposed system for future research and application of deep learning models aimed at robotic grasping detection.

面向抓取檢測的位姿估計數據集自動采集標注系統

Automatic data collection and annotation system for a pose estimation dataset designed for grasping detection