Design and implementation of multi-feature fusion moving target detection algorithms in a complex environment based on SiamMask
-
摘要: 隨著無人工廠、智能安監等技術在制造業領域的深入應用,以視覺識別預警系統為代表的復雜環境下動態識別技術成為智能工業領域的重要研究內容之一。在本文所述的工業級視覺識別預警系統中,操作人員頭發區域由于其具有移動形態非規則性、運動無規律性的特點,在動態圖像中的實時分割較為困難。針對此問題,提出一種基于SiamMask模型的時空預測移動目標跟蹤算法。該算法將基于PyTorch深度學習框架的SiamMask單目標跟蹤算法與ROI檢測及STC時空上下文預測算法相融合,根據目標時空關系的在線學習,預測新的目標位置并對SiamMask模型進行算法校正,實現視頻序列中的目標快速識別。實驗結果表明,所提出的算法能夠克服環境干擾、目標遮擋對跟蹤效果的影響,將目標跟蹤誤識別率降低至0.156%。該算法計算時間成本為每秒30幀,比改進前的SiamMask模型幀率每秒提高3.2幀,算法效率提高11.94%。該算法達到視覺識別預警系統準確性、實時性的要求,對移動目標識別算法模型的復雜環境應用具有借鑒意義。Abstract: Moving target recognition in a complex environment is recently an important research direction in the field of image recognition. The current research focus is how to track moving objects online in complex scenes to meet the real-time and reliability requirements of image tracking and subsequent processing. With the in-depth application of unmanned factory, intelligent safety supervision and other technologies in the field of manufacturing industry, dynamic recognition technology in the complex environment represented by a visual recognition warning system has become an important research in the field of intelligent industry, and the detection requirements of high reliability and real-time for mobile target detection have been identified. In the industrial level vision recognition warning system described in this paper, the hair area of operators was difficult to be segmented in real time because of its irregular movement. To solve this problem, a space-time predictive moving target tracking algorithm was proposed based on the SiamMask model. This algorithm combined the SiamMask single target tracking algorithm based on the PyTorch deep learning framework with ROI detection and STC spatiotemporal context prediction algorithm. According to the online learning of the spatiotemporal relationship of the target, it predicted the new target location and corrected the algorithm of the SiamMask model to realize the fast recognition of the target in the video sequence. The experimental results show that the proposed algorithm can overcome the influence of environmental interference and target occlusion on the tracking effect, reducing the target tracking error recognition rate to 0.156%. The computational time cost is 30 frames per second, which is 3.2 frames per second greater than the frame rate of the improved SiamMask model and 11.94% greater efficiency than that of the original SiamMask model. The algorithm meets the requirements of accuracy and real-time performance of the visual recognition and early warning system, and has reference significance for the application of the moving target recognition algorithm model in a complex environment.
-
Key words:
- deep learning /
- complex environment /
- moving target recognition /
- SiamMask /
- STC
-
表 1 SiamMask模型目標跟蹤效果統計
Table 1. Statistics of target tracking effect of the SiamMask model
Video No. Frame number of false detection Analysis on the causes of false inspection Total frames Failure rate/% 1 0 Little change in this movement 361 0 2 87 Misidentified as dark cloth 288 30.21 3 98 Part of the face is blocked by the hair 192 51.04 4 674 Initialization offset, screen will pop up in recognition 1380 48.84 5 131 The target moves out of the screen slightly and the recognition is lost 240 54.58 6 753 Large proportion of face selection in initialization area 1360 55.37 7 0 Accurate initialization and small action range 241 0 表 2 基于SiamMask模型的時空預測算法目標跟蹤效果統計
Table 2. Statistics of the target tracking effect of the spatiotemporal prediction algorithms based on the SiamMask model
Video No. Frame number of false detection Analysis on the causes of false inspection Total frames Failure rate/% 1 0 Little change in this movement 361 0 2 0 Misidentified as dark cloth 288 0 3 1 Part of the face is blocked by the hair 192 0.52 4 2 Initialization offset, screen will pop up in recognition 1380 0.15 5 1 The target moves out of the screen slightly and the recognition is lost 240 0.42 6 0 Large proportion of face selection in initialization area 1360 0 7 0 Accurate initialization and small action range 241 0 259luxu-164 -
參考文獻
[1] Xing J L, Ai H Z, Lao S H. Multiple human tracking based on multi-view upper-body detection and discriminative learning // 2010 20th International Conference on Pattern Recognition. Istanbul, 2010: 1698 [2] Liu L W, Xing J L, Ai H Z, et al. Hand posture recognition using finger geometric feature // Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). Tsukuba, 2012: 565 [3] Bertinetto L, Valmadre J, Henriques J F, et al. Fully-convolutional siamese networks for object tracking // European Conference on Computer Vision. Cham: Springer, 2016: 850 [4] Li B, Yan J J, Wu W, et al. High performance visual tracking with siamese region proposal network // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 8971 [5] Wang Q, Zhang L, Bertinetto L, et al. Fast online object tracking and segmentation: a unifying approach // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 1328 [6] Li J Y, Zhao Y K, Xue Z E, et al. A survey of model compression for deep neural networks. Chin J Eng, 2019, 41(10): 1229李江昀, 趙義凱, 薛卓爾, 等. 深度神經網絡模型壓縮綜述. 工程科學學報, 2019, 41(10): 1229 [7] Shen L L, Hang N. No-reference image quality assessment using joint multiple edge detection. Chin J Eng, 2018, 40(8): 996沈麗麗, 杭寧. 聯合多種邊緣檢測算子的無參考質量評價算法. 工程科學學報, 2018, 40(8): 996 [8] Wang X, Tang Z M. An improved camshaft-based particle filter algorithm for real-time target tracking. J Image Graph, 2010, 15(10): 1507王鑫, 王振民. 一種改進的基于Camshift的粒子濾波實時目標跟蹤算法. 中國圖像圖形學報, 2010, 15(10):1507 [9] Chu H X, Xie Z Y, Wang K J. An improved camshift target tracking algorithm based on joint color-texture histogram. J Xi’an Jiaotong Univ, 2018, 52(3): 145初紅霞, 謝忠玉, 王科俊. 一種結合顏色紋理直方圖的改進型Camshift目標跟蹤算法. 西安交通大學學報, 2018, 52(3):145 [10] Yacoob Y, Davis L S. Detection and analysis of hair. IEEE Trans Pattern Anal Mach Intell, 2006, 28(7): 1164 doi: 10.1109/TPAMI.2006.139 [11] Fu W L, Hu F Q. Hair detection with mean shift. Microcomput Appl, 2010, 26(9): 62傅文林, 胡福喬. 基于mean shift的頭發自動檢測. 微型電腦應用, 2010, 26(9):62 [12] Wang Z Y, Yang D L. Research on detection method of hair-occlusion in face recognition. Microcomput Its Appl, 2016, 35(2): 32 doi: 10.3969/j.issn.1674-7720.2016.02.012王志一, 楊大利. 人臉識別中發型遮擋檢測方法研究. 微型機與應用, 2016, 35(2):32 doi: 10.3969/j.issn.1674-7720.2016.02.012 [13] Ding C J, Yan B. Target tracking algorithm combines STC with CamShift. Transduc Microsyst Technol, 2018, 37(5): 108丁承君, 閆彬. 時空上下文與CamShift相結合的目標跟蹤算法. 傳感器與微系統, 2018, 37(5):108 [14] Eriksson D, Bindel D, Shoemaker C. Surrogate Optimization Toolbox Hithub (2019-1-5)[2019-6-5]. https://github.com/dme65/pySOT [15] Qiang Wang, Li Zhang, Luca Bertinetto, et al. Torr. SiamMask Hithub (2019-5-5)[2019-6-5]. https://github.com/foolwood/SiamMask [16] Zhang L, Suganthan P N. Robust visual tracking via co-trained kernelized correlation filters. Pattern Recognit, 2017, 69: 82 doi: 10.1016/j.patcog.2017.04.004 [17] Chen C, Wu Z X, Jiang Y G. Emotion in context: Deep semantic feature fusion for video emotion recognition // Proceedings of the 24th ACM International Conference on Multimedia. New York, 2016: 127 [18] Huang Y, Huang X, Li S G, et al. Compensation method for a robot vision system with an occluded camera field. Chin J Eng, 2018, 40(3): 381黃煜, 黃翔, 李瀧杲, 等. 存在視場丟失的機器視覺精度補償方法, 工程科學學報, 2018, 40(3): 381 [19] Zhang J, Wang X, Fan H B. Spatio temporal context target tracking algorithm of self-adaption learning. Comput Eng, 2018, 44(6): 294 doi: 10.3969/j.issn.1000-3428.2018.06.050張晶, 王旭, 范洪博. 自適應學習的時空上下文目標跟蹤算法. 計算機工程, 2018, 44(6):294 doi: 10.3969/j.issn.1000-3428.2018.06.050 [20] Pont-Tuset J, Perazzi F, Caelles S, et al. The 2017 Davis Challenge on Video Object Segmentation (2017-4-3)[2019-6-5]. https://arxiv.org/abs/1704.00675 -