復雜環境下一種基于SiamMask的時空預測移動目標跟蹤算法

周珂; 張浩博; 付冬梅; 趙志毅; 曾惠

doi:10.13374/j.issn2095-9389.2019.06.06.005

復雜環境下一種基于SiamMask的時空預測移動目標跟蹤算法

doi: 10.13374/j.issn2095-9389.2019.06.06.005

1.
北京科技大學高等工程師學院，北京 100083
2.
北京科技大學自動化學院，北京 100083

基金項目: 國家自然科學基金資助項目(61375010)；北京科技大學基本科研業務費資助項目(FRF-OT-18-020SY)

詳細信息

通訊作者:
E-mail: cocofay126@126.com

中圖分類號: TG142.71
計量
- 文章訪問數: 1732
- HTML全文瀏覽量: 2881
- PDF下載量: 77
- 被引次數: 0
出版歷程
- 收稿日期: 2019-06-06
- 刊出日期: 2020-03-01

Design and implementation of multi-feature fusion moving target detection algorithms in a complex environment based on SiamMask

1.
School of Advanced Engineering, University of Science and Technology Beijing, Beijing 100083, China
2.
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China

More Information

Corresponding author: E-mail: cocofay126@126.com

摘要

摘要: 隨著無人工廠、智能安監等技術在制造業領域的深入應用，以視覺識別預警系統為代表的復雜環境下動態識別技術成為智能工業領域的重要研究內容之一。在本文所述的工業級視覺識別預警系統中，操作人員頭發區域由于其具有移動形態非規則性、運動無規律性的特點，在動態圖像中的實時分割較為困難。針對此問題，提出一種基于SiamMask模型的時空預測移動目標跟蹤算法。該算法將基于PyTorch深度學習框架的SiamMask單目標跟蹤算法與ROI檢測及STC時空上下文預測算法相融合，根據目標時空關系的在線學習，預測新的目標位置并對SiamMask模型進行算法校正，實現視頻序列中的目標快速識別。實驗結果表明，所提出的算法能夠克服環境干擾、目標遮擋對跟蹤效果的影響，將目標跟蹤誤識別率降低至0.156%。該算法計算時間成本為每秒30幀，比改進前的SiamMask模型幀率每秒提高3.2幀，算法效率提高11.94%。該算法達到視覺識別預警系統準確性、實時性的要求，對移動目標識別算法模型的復雜環境應用具有借鑒意義。
- 深度學習 /
- 復雜環境 /
- 移動目標識別 /
- SiamMask /
- STC
Abstract: Moving target recognition in a complex environment is recently an important research direction in the field of image recognition. The current research focus is how to track moving objects online in complex scenes to meet the real-time and reliability requirements of image tracking and subsequent processing. With the in-depth application of unmanned factory, intelligent safety supervision and other technologies in the field of manufacturing industry, dynamic recognition technology in the complex environment represented by a visual recognition warning system has become an important research in the field of intelligent industry, and the detection requirements of high reliability and real-time for mobile target detection have been identified. In the industrial level vision recognition warning system described in this paper, the hair area of operators was difficult to be segmented in real time because of its irregular movement. To solve this problem, a space-time predictive moving target tracking algorithm was proposed based on the SiamMask model. This algorithm combined the SiamMask single target tracking algorithm based on the PyTorch deep learning framework with ROI detection and STC spatiotemporal context prediction algorithm. According to the online learning of the spatiotemporal relationship of the target, it predicted the new target location and corrected the algorithm of the SiamMask model to realize the fast recognition of the target in the video sequence. The experimental results show that the proposed algorithm can overcome the influence of environmental interference and target occlusion on the tracking effect, reducing the target tracking error recognition rate to 0.156%. The computational time cost is 30 frames per second, which is 3.2 frames per second greater than the frame rate of the improved SiamMask model and 11.94% greater efficiency than that of the original SiamMask model. The algorithm meets the requirements of accuracy and real-time performance of the visual recognition and early warning system, and has reference significance for the application of the moving target recognition algorithm model in a complex environment.
- deep learning /
- complex environment /
- moving target recognition /
- SiamMask /
- STC

HTML全文

圖 1 SiamMask模型算法流程圖^[5]. （a）三分支變型架構；（b）二分支變型架構核心

Figure 1. SiamMask model algorithmic flow chart^[5]: (a) three-branch variant architecture; (b) two-branch variant head

下載: 全尺寸圖片幻燈片

圖 2 SiamMask模型面部檢測效果. （a）束發頭部跟蹤；（b）長發頭部跟蹤Ⅰ；（c）長發頭部跟蹤Ⅱ

Figure 2. SiamMask model face detection effect: (a) bundle head tracking; (b) long hair head tracking I; (c) long hair head tracking II

下載: 全尺寸圖片幻燈片

圖 3 SiamMask模型測試誤識別現象. （a）深色干擾源誤識別；（b）肉色干擾誤識別；（c）頭發遮擋誤識別

Figure 3. SiamMask model test misrecognition phenomenon: (a) misidentification of dark interference sources; (b) misidentification of flesh color interference; (c) misidentification of hair occlusion

下載: 全尺寸圖片幻燈片

圖 4 基于SiamMask模型的時空預測移動目標跟蹤算法框架圖

Figure 4. Framework of spatiotemporal prediction moving target tracking algorithms based on the SiamMask Model

下載: 全尺寸圖片幻燈片

圖 5 車工監控視頻ROI提取結果. （a）原始畫面；（b）ROI提取畫面

Figure 5. ROI extraction result of a locomotive monitoring video: (a) original picture; (b) ROI extraction picture

下載: 全尺寸圖片幻燈片

圖 6 運動圖像檢測灰度圖. （a）無運動目標時的原圖/灰度圖；（b）運動目標出現時的原圖/灰度圖

Figure 6. Gray level image of moving image detection: (a) original image / gray level image without moving object; (b) original image / gray level image when moving object appears

下載: 全尺寸圖片幻燈片

圖 7 圖像灰度及閾值化處理. (a)原始視頻圖像；(b) 灰度化處理結果；(c) 閾值化處理結果

Figure 7. Gray level and threshold processing of image: (a) original video image; (b) grayscale processing results; (c) threshold processing results

下載: 全尺寸圖片幻燈片

圖 8 算法訓練/測試準確率及損失率曲線. （a）訓練集準確率曲線圖；（b）訓練集損失率曲線圖；（c）測試集準確率曲線圖；（d）測試集損失率曲線圖

Figure 8. Algorithm training/test accuracy and loss rate curve: (a) training set accuracy curve; (b) training set loss curve; (c) test set accuracy curve; (d) test set loss curve

下載: 全尺寸圖片幻燈片

圖 9 固定危險區劃分示意圖. （a）視頻危險區劃分圖；（b）危險級別劃分示意圖

Figure 9. Fixed danger zone division diagram: (a) video dangerous zone division map; (b) diagram of hazard classification

下載: 全尺寸圖片幻燈片

圖 10 頭發目標跟蹤報警結果. （a）第30幀；（b）第35幀；(c)第60幀；（d）第70幀；（e）第75幀；（f）第80幀

Figure 10. Hair target tracking alarm results: (a) frame 30; (b) frame 35; (c) frame 60; (d) frame 70; (e) frame 75; (f) frame 80

下載: 全尺寸圖片幻燈片

表 1 SiamMask模型目標跟蹤效果統計

Table 1. Statistics of target tracking effect of the SiamMask model

Video No.	Frame number of false detection	Analysis on the causes of false inspection	Total frames	Failure rate/%
1	0	Little change in this movement	361	0
2	87	Misidentified as dark cloth	288	30.21
3	98	Part of the face is blocked by the hair	192	51.04
4	674	Initialization offset, screen will pop up in recognition	1380	48.84
5	131	The target moves out of the screen slightly and the recognition is lost	240	54.58
6	753	Large proportion of face selection in initialization area	1360	55.37
7	0	Accurate initialization and small action range	241	0

下載: 導出CSV

表 2 基于SiamMask模型的時空預測算法目標跟蹤效果統計

Table 2. Statistics of the target tracking effect of the spatiotemporal prediction algorithms based on the SiamMask model

Video No.	Frame number of false detection	Analysis on the causes of false inspection	Total frames	Failure rate/%
1	0	Little change in this movement	361	0
2	0	Misidentified as dark cloth	288	0
3	1	Part of the face is blocked by the hair	192	0.52
4	2	Initialization offset, screen will pop up in recognition	1380	0.15
5	1	The target moves out of the screen slightly and the recognition is lost	240	0.42
6	0	Large proportion of face selection in initialization area	1360	0
7	0	Accurate initialization and small action range	241	0

下載: 導出CSV

259luxu-164

參考文獻(20)

[1]	Xing J L, Ai H Z, Lao S H. Multiple human tracking based on multi-view upper-body detection and discriminative learning // 2010 20th International Conference on Pattern Recognition. Istanbul, 2010: 1698
[2]	Liu L W, Xing J L, Ai H Z, et al. Hand posture recognition using finger geometric feature // Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). Tsukuba, 2012: 565
[3]	Bertinetto L, Valmadre J, Henriques J F, et al. Fully-convolutional siamese networks for object tracking // European Conference on Computer Vision. Cham: Springer, 2016: 850
[4]	Li B, Yan J J, Wu W, et al. High performance visual tracking with siamese region proposal network // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 8971
[5]	Wang Q, Zhang L, Bertinetto L, et al. Fast online object tracking and segmentation: a unifying approach // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 1328
[6]	Li J Y, Zhao Y K, Xue Z E, et al. A survey of model compression for deep neural networks. Chin J Eng, 2019, 41(10): 1229 李江昀, 趙義凱, 薛卓爾, 等. 深度神經網絡模型壓縮綜述. 工程科學學報, 2019, 41(10): 1229
[7]	Shen L L, Hang N. No-reference image quality assessment using joint multiple edge detection. Chin J Eng, 2018, 40(8): 996 沈麗麗, 杭寧. 聯合多種邊緣檢測算子的無參考質量評價算法. 工程科學學報, 2018, 40(8): 996
[8]	Wang X, Tang Z M. An improved camshaft-based particle filter algorithm for real-time target tracking. J Image Graph, 2010, 15(10): 1507 王鑫, 王振民. 一種改進的基于Camshift的粒子濾波實時目標跟蹤算法. 中國圖像圖形學報, 2010, 15(10):1507
[9]	Chu H X, Xie Z Y, Wang K J. An improved camshift target tracking algorithm based on joint color-texture histogram. J Xi’an Jiaotong Univ, 2018, 52(3): 145 初紅霞, 謝忠玉, 王科俊. 一種結合顏色紋理直方圖的改進型Camshift目標跟蹤算法. 西安交通大學學報, 2018, 52(3):145
[10]	Yacoob Y, Davis L S. Detection and analysis of hair. IEEE Trans Pattern Anal Mach Intell, 2006, 28(7): 1164 doi: 10.1109/TPAMI.2006.139
[11]	Fu W L, Hu F Q. Hair detection with mean shift. Microcomput Appl, 2010, 26(9): 62 傅文林, 胡福喬. 基于mean shift的頭發自動檢測. 微型電腦應用, 2010, 26(9):62
[12]	Wang Z Y, Yang D L. Research on detection method of hair-occlusion in face recognition. Microcomput Its Appl, 2016, 35(2): 32 doi: 10.3969/j.issn.1674-7720.2016.02.012 王志一, 楊大利. 人臉識別中發型遮擋檢測方法研究. 微型機與應用, 2016, 35(2):32 doi: 10.3969/j.issn.1674-7720.2016.02.012
[13]	Ding C J, Yan B. Target tracking algorithm combines STC with CamShift. Transduc Microsyst Technol, 2018, 37(5): 108 丁承君, 閆彬. 時空上下文與CamShift相結合的目標跟蹤算法. 傳感器與微系統, 2018, 37(5):108
[14]	Eriksson D, Bindel D, Shoemaker C. Surrogate Optimization Toolbox Hithub (2019-1-5)[2019-6-5]. https://github.com/dme65/pySOT
[15]	Qiang Wang, Li Zhang, Luca Bertinetto, et al. Torr. SiamMask Hithub (2019-5-5)[2019-6-5]. https://github.com/foolwood/SiamMask
[16]	Zhang L, Suganthan P N. Robust visual tracking via co-trained kernelized correlation filters. Pattern Recognit, 2017, 69: 82 doi: 10.1016/j.patcog.2017.04.004
[17]	Chen C, Wu Z X, Jiang Y G. Emotion in context: Deep semantic feature fusion for video emotion recognition // Proceedings of the 24th ACM International Conference on Multimedia. New York, 2016: 127
[18]	Huang Y, Huang X, Li S G, et al. Compensation method for a robot vision system with an occluded camera field. Chin J Eng, 2018, 40(3): 381 黃煜, 黃翔, 李瀧杲, 等. 存在視場丟失的機器視覺精度補償方法, 工程科學學報, 2018, 40(3): 381
[19]	Zhang J, Wang X, Fan H B. Spatio temporal context target tracking algorithm of self-adaption learning. Comput Eng, 2018, 44(6): 294 doi: 10.3969/j.issn.1000-3428.2018.06.050 張晶, 王旭, 范洪博. 自適應學習的時空上下文目標跟蹤算法. 計算機工程, 2018, 44(6):294 doi: 10.3969/j.issn.1000-3428.2018.06.050
[20]	Pont-Tuset J, Perazzi F, Caelles S, et al. The 2017 Davis Challenge on Video Object Segmentation (2017-4-3)[2019-6-5]. https://arxiv.org/abs/1704.00675