• T-STAM:基于双流时空注意力机制的端到端的动作识别模型

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2020-09-28 Cooperative journals: 《计算机应用研究》

    Abstract: Aiming at the problems that the action recognition methods based on two-stream ignore the inter-relationship between feature channels, and have large amount of redundant spatio-temporal information, this paper proposed an end-to-end action recognition model based on two-stream network with spatio-temporal attention mechanism (T-STAM) , which realized the full utilization of the key spatio-temporal information in the video. Firstly, this paper introduced the channel attention mechanism to the two-stream basic network, and calibrated the channel information by modeling the dependencies between feature channels to improve the ability of future expression. Secondly, this paper proposed a CNN-based temporal attention model to learn the attention score of each frame with fewer parameters, which can focuses on the frames with significant amplitude of motion. At the same time, it proposed a multi-spatial attention model, which calculated the attention score of each position in frame from different angles to extract motion saliency areas. Then, temporal and spatial features were fused to further enhance the feature representation of video. Finally, the fused features were input into the classification network, and the results of each stream are fused according to different weights to obtain the recognition results. The experimental results on the datasets HMDB51 and UCF101 show that T-STAM can effectively recognize actions in video.