<th id="5nh9l"></th><strike id="5nh9l"></strike><th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th><strike id="5nh9l"></strike>
<progress id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"><noframes id="5nh9l">
<th id="5nh9l"></th> <strike id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span>
<progress id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span><strike id="5nh9l"><noframes id="5nh9l"><strike id="5nh9l"></strike>
<span id="5nh9l"><noframes id="5nh9l">
<span id="5nh9l"><noframes id="5nh9l">
<span id="5nh9l"></span><span id="5nh9l"><video id="5nh9l"></video></span>
<th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th>
<progress id="5nh9l"><noframes id="5nh9l">

基于自校準機制的時空采樣圖卷積行為識別模型

Action recognition model based on the spatiotemporal sampling graph convolutional network and self-calibration mechanism

  • 摘要: 針對現有行為識別算法忽視時空信息上下文的依賴關系和缺乏多層次感受野的特征提取問題,本文提出一種基于自校準機制的時空采樣圖卷積網絡行為識別模型. 首先,介紹ST-GCN和3D-GCN、Transformer和自注意力機制的工作原理,并分析了3D-GCN和Transformer不能有效進行時空上下文建模;其次,為有效進行時空上下文建模而提出了一種時空采樣圖卷積網絡,其以時序連續多幀作為時空采樣將全局動作分為多個子動作,通過非局部網絡計算單一節點與采樣頻率幀內所有節點的相關性來建立局部跨時空依賴關系,并通過結合非局部網絡和時域卷積計算單個采樣子動作與全局子動作的相關性以此來建立全局跨時空依賴關系;然后,為了有效地增強多層次的感受野來捕獲更具判別力的時域特征,提出了一種時域自校準卷積網絡在兩個不同的尺度時空中分別進行卷積并特征融合:一種是原始比例尺度的時空,另一種是使用下采樣具有較小比例尺度的潛在時空;再者,結合時空采樣圖卷積網絡和時域自校準網絡構建基于自校準機制的時空采樣圖卷積網絡行為識別模型,在多流網絡下進行端到端的訓練. 最后,基于NTU-RGB+D和NTU-RGB+D120骨架動作數據集開展了骨架行為識別的相關實驗研究,研究結果表明該行為識別模型具有高效的時空特征提取能力以及優秀的性能.

     

    Abstract: A skeleton-based action recognition model is proposed for the spatiotemporal sampling graph convolutional network (ST-GCN) based on the self-calibration mechanism to address the problem of existing action recognition algorithms disregarding the dependence of spatiotemporal information context and lacking multilevel receptive fields for feature extraction. First, this paper introduces the working principles of ST-GCN and 3D-GCN, Transformer, and self-attention mechanism and analyzes whether 3D-GCN and Transformer cannot effectively model the global and local spatiotemporal contexts, respectively. Second, a spatiotemporal sampling graph convolutional network is proposed to effectively perform the spatiotemporal context modeling. This network divides the global action into multiple subactions by employing a series of continuous temporal multiframes as spatiotemporal sampling, establishes the local crosstemporal dependency by computing the correlation between a single node and all nodes in the sampling frequency frame with the nonlocal network, and establishes the global crosstemporal dependency by combining the nonlocal network and temporal convolution to compute the correlation between a single sampling subaction and global subactions. Subsequently, to effectively improve the multilevel receptive field for capturing more discriminating temporal features, a temporal self-calibrating convolutional network is proposed for convoluting in two different scales of space-time. Further, two abovementioned features can be combined: one is the space–time of the original scale, while the other is the potential space-time with a smaller scale using downsampling operation; here, the latter adaptively establishes the dependence between the remote space-time and channel and models the interchannel dependence by differentiating the characteristics of each channel. Meanwhile, the spatiotemporal sampling graph convolutional and temporal self-calibration networks are combined to construct the spatiotemporal-sampling graph convolutional network based on self-calibration mechanism, and end-to-end training is performed on this model using the multistream network. Finally, to confirm the effectiveness and superior performance of the model feature extraction, some experimental work is performed on the skeleton-based action recognition based on the NTU-RGB+D and NTU-RGB+D120 skeleton-based action datasets, and the findings reveal that the recognition accuracy under X-View and X-Sub of the NTU-RGB+D dataset reaches up to 95.2% and 88.8%, respectively, confirming the generalization ability of the model on the NTU-RGB+D120 dataset. This work displays that the model has excellent recognition accuracy and generalization ability and corroborates the effective spatiotemporal feature extraction ability and excellent performance of the action recognition model.

     

/

返回文章
返回
<th id="5nh9l"></th><strike id="5nh9l"></strike><th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th><strike id="5nh9l"></strike>
<progress id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"><noframes id="5nh9l">
<th id="5nh9l"></th> <strike id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span>
<progress id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"><noframes id="5nh9l"><span id="5nh9l"></span><strike id="5nh9l"><noframes id="5nh9l"><strike id="5nh9l"></strike>
<span id="5nh9l"><noframes id="5nh9l">
<span id="5nh9l"><noframes id="5nh9l">
<span id="5nh9l"></span><span id="5nh9l"><video id="5nh9l"></video></span>
<th id="5nh9l"><noframes id="5nh9l"><th id="5nh9l"></th>
<progress id="5nh9l"><noframes id="5nh9l">
259luxu-164