基于自校準機制的時空采樣圖卷積行為識別模型

曹毅; 吳偉官; 張小勇; 夏宇; 高清源

doi:10.13374/j.issn2095-9389.2022.12.25.002

基于自校準機制的時空采樣圖卷積行為識別模型

doi: 10.13374/j.issn2095-9389.2022.12.25.002

江南大學

詳細信息

中圖分類號: TP391.41
計量
- 文章訪問數: 140
- HTML全文瀏覽量: 14
- PDF下載量: 14
- 被引次數: 0
出版歷程
- 網絡出版日期: 2023-04-04

Action Recognition Model Based on Spatio-temporal Sampling Graph Convolution Network and Self-calibration Mechanism

摘要

摘要: 針對現有行為識別算法忽視時空信息上下文的依賴關系和時空與通道之間的依賴關系的問題，本文提出一種基于自校準機制的時空采樣圖卷積網絡行為識別模型。首先，介紹ST-GCN和3D-GCN、Transformer和自注意力機制的工作原理，其次，提出一種時空采樣圖卷積網絡以時序連續多幀作為時空采樣，通過構建時空鄰接矩陣參與圖卷積來建立局部和全局時空上下文依賴關系。然后，為了有效地建立時空與通道之間的依賴關系并增強多層次的感受野來捕獲更具判別力的時域特征，提出了一種時域自校準卷積網絡在兩個不同的尺度空間中進行卷積并特征融合：一種是原始比例尺度的時空，另一種是使用下采樣具有較小比例尺度的潛在時空。再者，結合時空采樣圖卷積網絡和時域自校準網絡構建基于自校準機制的時空采樣圖卷積網絡行為識別模型，在多流網絡下進行端到端的訓練。最后，基于NTU-RGB+D和NTU-RGB+D120骨架動作數據集開展了骨架行為識別的研究。研究結果進一步驗證了該行為識別模型針對時空特征的有效提取能力及優秀的識別準確率。
- 行為識別 /
- 時空采樣圖卷積 /
- 時空上下文 /
- 時域自校準 /
- 多流網絡
Abstract: Aiming at the problem that the existing behavior recognition algorithms ignore the dependency of spatio-temporal information context and the dependency between spatio-temporal information and channels, this paper proposes a spatio-temporal sampling graph convolution network action recognition model based on self calibration mechanism. Firstly, the principles of ST-GCN and 3D-GCN, Transformer and self-attention mechanism are introduced. Secondly, a spatio-temporal sampling graph convolution network is proposed, which takes sequential multiple frames as spatio-temporal samples, and establishes local and global spatiotemporal context dependencies by constructing spatio-temporal adjacency matrix to participate in graph convolution. Then, in order to effectively establish the dependency between space-time and channels and enhance the multi-level receptive field to capture more discriminative time-domain features, a temporal self-calibrating convolution network is proposed to convolve and fuse features in two different scale spaces: one is the original scale space-time, and the other is the use of down sampling potential space-time with smaller scale. Furthermore, combining the spatio-temporal sampling map convolution network and the temporal self-calibration network, a behavior recognition model of the spatio-temporal sampling graph convolution network based on the self-calibration mechanism is constructed, and end-to-end training is carried out based on the mutil-stream network. Finally, the researches on skeleton-based action recognition are carried on NTU-RGB+D and NTU-RGB+D120 datasets. The results further verify the effective extraction ability and excellent recognition accuracy of the action recognition model for spatio-temporal features.
- Action recognition /
- Spatio-temporal sampling graph convolution network /
- Spatio-temporal context /
- Self calibration mechanism /
- Mutil-stream network