基于多注意力的改進YOLOv5s小目標檢測算法

馬鴿; 李洪偉; 嚴梓維; 劉志杰; 趙志甲

doi:10.13374/j.issn2095-9389.2024.01.18.003

摘要: 交通標志識別應用中待檢測目標多為小目標，因其攜帶信息少、定位精度要求高、易被環境噪聲淹沒等特點成為當前交通標志檢測的難點. 針對小目標交通標志漏檢、誤檢、檢測準確率低等問題，本文設計了一種用于小目標檢測的STD-YOLOv5s (Small target detection YOLOv5s )模型. 首先，通過增加上采樣和Prediction輸出層數獲得了更豐富的位置信息，解決了YOLOv5s模型在處理小目標時信息不足的問題，增強了對圖像的全局理解能力；其次，在每個C3模塊之后添加CA(Coordinate attention)注意力機制并在每個輸出層前添加Swin-T注意力機制模塊，增加了網絡對多層特征信息的捕捉，提高了小目標的檢測性能；最后，充分利用SIoU懲罰函數同時考慮目標形狀、空間關系的特點，更好地捕捉不同尺寸的目標在圖像中的位置關系，提高目標位置的精確性. 所提模型在TT100K數據集上進行了驗證實驗，實驗結果表明本文方法不僅保持了YOLOv5s模型的輕量性和快速性，在精確率、召回率和平均精度三個指標上也有所提升，提高了小目標檢測的精確性.

Abstract: Traffic sign detection and recognition facilitates real-time monitoring and interpretation of various traffic signs on the road, such as those indicating speed limits, prohibition of overtaking, and navigation cues. This has substantial applications for autonomous driving and decision-making systems. Consequently, designing accurate and efficient algorithms for the automatic recognition of traffic signs is crucial in the intelligent transportation field. However, targets that need to be detected by traffic sign recognition applications are mostly small-sized, causing challenges regarding their automatic recognition. The YOLOv5s model, characterized by its minimal depth and narrowest feature map, has gained widespread popularity for executing detection owing to its features of being lightweight and easily portable. Furthermore, the YOLOv5s model uses an anchor-based prediction approach that uses anchor boxes of different sizes and shapes to regress and classify various targets. This method generates dense anchor boxes and enables the model to directly perform object classification and bounding box regression, thereby enhancing its target recall capability. Therefore, the anchor-based Yolov5s method has been applied to traffic sign detection; however, it suffers from issues such as false positives and missed detection. Detection of small targets continues to be a challenging aspect in current traffic sign recognition technology due to the following: small targets carry less information; detection of small targets requires high precision in positioning; and environmental noise may overwhelm the detection of small targets. To overcome the abovementioned issues, such as missed detection, false positives, and low detection accuracy, this study proposes a model called STD-YOLOv5s that is specifically designed for small target detection. First, by increasing the number of upsampling and prediction output layers, this model obtains abundant location information. This can enhance the global understanding of images and solve the issue of insufficient information associated with small targets. Second, the CA attention mechanism is added after each C3 module, whereas the Swin-T attention mechanism module is added before each output layer, increasing the model’s ability to capture multilayer feature information and consequently improving its performance of small target detection. Finally, the accuracy of target localization is ensured using the SIoU penalty function, which considers the target shape and spatial relationships, thereby increasing the model’s ability to capture the positional relationships among targets of different sizes in the image. The STD-YOLOv5s model was validated using the TT100K dataset by ablation and comparison experiments. Experimental results indicate that the proposed model not only maintains the lightweight nature and high detection speed of the YOLOv5s model but also achieves improvements in precision, recall, and average precision.

基于多注意力的改進YOLOv5s小目標檢測算法

Improved small target detection algorithm based on multiattention and YOLOv5s for traffic sign recognition