Abstract:
Aiming at the problem that the existing behavior recognition algorithms ignore the dependency of spatio-temporal information context and the dependency between spatio-temporal information and channels, this paper proposes a spatio-temporal sampling graph convolution network action recognition model based on self calibration mechanism. Firstly, the principles of ST-GCN and 3D-GCN, Transformer and self-attention mechanism are introduced. Secondly, a spatio-temporal sampling graph convolution network is proposed, which takes sequential multiple frames as spatio-temporal samples, and establishes local and global spatiotemporal context dependencies by constructing spatio-temporal adjacency matrix to participate in graph convolution. Then, in order to effectively establish the dependency between space-time and channels and enhance the multi-level receptive field to capture more discriminative time-domain features, a temporal self-calibrating convolution network is proposed to convolve and fuse features in two different scale spaces: one is the original scale space-time, and the other is the use of down sampling potential space-time with smaller scale. Furthermore, combining the spatio-temporal sampling map convolution network and the temporal self-calibration network, a behavior recognition model of the spatio-temporal sampling graph convolution network based on the self-calibration mechanism is constructed, and end-to-end training is carried out based on the mutil-stream network. Finally, the researches on skeleton-based action recognition are carried on NTU-RGB+D and NTU-RGB+D120 datasets. The results further verify the effective extraction ability and excellent recognition accuracy of the action recognition model for spatio-temporal features.