基于ODE-YOLO的水泥骨料生產車間工人安全穿戴檢測模型

李鑫; 胡慢谷; 佟瑞鵬

doi:10.13374/j.issn2095-9389.2025.03.31.003

摘要: 水泥骨料生產車間環境惡劣，存在重塵、碎石飛濺等風險，工人作業時須穿戴安全帽、口罩、反光衣等安全裝備。為保障作業規范和工人人身安全，開展安全穿戴檢測已成為業內共識。AI視頻分析是工人安全穿戴檢測的有效手段之一，但面臨著多尺度和小目標的挑戰，現有檢測算法仍存在檢測精度低、漏檢率高和實時性差等問題。為此，本文提出了一種基于改進YOLOv8的小目標多尺度檢測算法ODE-YOLO。首先，在YOLOv8基線模型基礎上，引入了ODConv模塊以增強網絡的特征提取能力，并采用EMA注意力機制來提升多尺度目標的表征能力。其次，采用iRMB架構優化重構了EMA，以解決EMA帶來的后處理效率低的問題，提出了iEMA注意力方法，實現了效率和性能的平衡。最后，基于某礦山水泥骨料生產車間現場采集不同時段和不同機位的視頻圖像，制備了9877個多尺度小目標樣本數據集，開展了ODE-YOLO算法的性能評估實驗。不同注意力機制實驗結果表明EMA算法對多尺度目標特征提取和表征能力提升方面效果最優，消融實驗和對比實驗結果表明ODE-Y1OLO有效增強了多尺度小目標的檢測精度，且在較小參數規模和計算量的情況下，mAP@0.5達到了0.868，小目標識別精度AP@0.5mask達到了0.722，兼具推理速度快和后處理時延低等特點，可實時準確地實現對水泥骨料生產車間工人安全穿戴檢測。

Abstract: In hazardous environments such as cement aggregate production plants, workers are required to wear safety equipment including helmets, masks, and reflective vests to mitigate the risks of heavy dust and flying debris. However, non-compliance with safety gear requirements remains prevalent, contributing to frequent workplace accidents. Manual supervision proves inefficient due to environmental limitations. As a result, the deployment of AI-based video analysis for real-time safety wear detection has become increasingly vital. Yet, this task presents significant challenges, particularly due to the presence of small objects and multi-scale targets in complex scenes, which compromise detection accuracy, increase false negative rates, and hinder real-time performance. To address these issues, this study proposes a novel multi-scale small object detection algorithm, ODE-YOLO, built upon the YOLOv8 architecture. The core innovation lies in integrating the Omni-Dimensional Dynamic Convolution (ODConv) module into the shallow layers of the backbone to enhance feature extraction for small objects, and embedding an improved attention mechanism, iEMA (inverted Efficient Multi-scale Attention), within the neck network to strengthen multi-scale feature representation while preserving real-time inference performance. The EMA module, known for its multi-scale parallel structure and spatial attention capabilities, was modified using an inverted residual mobile block (iRMB) to form iEMA. This structure balances efficiency and accuracy by reusing features, reducing computation, and eliminating the need for complex matrix operations found in traditional self-attention mechanisms. The combination of ODConv and iEMA allows the model to better capture contextual cues across varying object scales, especially for hard-to-detect categories like masks and unhelmeted heads. A customized dataset comprising 9,877 labeled instances was created using surveillance footage from multiple workstations in a cement plant, covering various time periods and camera angles. This dataset included six categories: vest, no-vest, helmet, head, mask, and no-mask. Statistical analysis revealed a strong presence of small and scale-diverse targets, with some classes occupying less than 0.5% of the image area. Training was conducted using PyTorch 2.0.0 on an NVIDIA RTX 3090 GPU. A comprehensive series of experiments was carried out, including attention mechanism comparisons, ablation studies, and benchmarking against state-of-the-art models such as YOLOv5n, YOLOv10n, Faster R-CNN, Mask R-CNN, and RT-DETR-L. The results demonstrate that the proposed ODE-YOLO outperforms other YOLO variants and R-CNN models in terms of mean average precision (mAP@0.5 = 0.868) and small object detection precision (AP@0.5mask = 0.722), while maintaining a lightweight architecture (11.3 MB) and fast inference (2.2 ms/image). The iEMA attention mechanism outperformed other mainstream attention modules (SE, CBAM, CA), particularly in improving the precision of mask detection by 28.5% compared to the baseline. Ablation experiments confirmed the individual and combined contributions of ODConv and iEMA to both accuracy and speed, evidencing their synergistic effect. Visual inspection using real-world test images showed that ODE-YOLO achieved balanced detection across object scales without missed detections or misclassifications, making it highly suitable for real-time deployment in production environments. In conclusion, this study introduces a robust and efficient algorithm tailored for safety wear detection in industrial scenarios characterized by multi-scale and small object challenges. ODE-YOLO provides a practical tool for enhancing workplace safety supervision, offering timely alerts for non-compliance, and supporting safety management personnel in mitigating risks and preventing accidents.

基于ODE-YOLO的水泥骨料生產車間工人安全穿戴檢測模型

LI Xin1), HU Mangu1), TONG Ruipeng1)