2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDMLSP-17.3
Paper Title Symmetric Sub-graph Spatio-Temporal Graph Convolution and its application in Complex Activity Recognition
Authors Pratyusha Das, Antonio Ortega, University of Southern California, United States
SessionMLSP-17: Graph Neural Networks
LocationGather.Town
Session Time:Wednesday, 09 June, 14:00 - 14:45
Presentation Time:Wednesday, 09 June, 14:00 - 14:45
Presentation Poster
Topic Machine Learning for Signal Processing: [MLR-DEEP] Deep learning techniques
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract Understanding complex hand actions from hand skeleton data is an important yet challenging task. In this paper, we analyze hand skeleton-based complex activities by modeling dynamic hand skeletons through a spatio-temporal graph convolutional network (ST-GCN). This model jointly learns and extracts Spatio-temporal features for activity recognition. Our proposed technique, Symmetric Sub-graph spatio-temporal graph convolutional neural network (S^2-ST-GCN), exploits the symmetric nature of hand graphs to decompose them into sub-graphs, which allow us to build a separate temporal model for the relative motion of the fingers. This subgraph approach can be implemented efficiently by preprocessing input data using a Haar unit based orthogonal matrix. Then, in addition to spatial filters, separate temporal filters can be learned for each sub-graph. We evaluate the performance of the proposed method on the First-Person Hand Action dataset. While the proposed method shows comparable performance with the state of the art methods in train:test=1:1 setting, it achieves this with greater stability. Furthermore, we demonstrate significant performance improvement in comparison to state of the art methods in the cross-person setting. S^2-ST-GCN also outperforms a finger-based decomposition of the hand graph where no preprocessing is applied.