2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

IEEE Signal Processing Society

Institute of Electrical and Electronics Engineers (IEEE)

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper ID	MLSP-17.3
Paper Title	Symmetric Sub-graph Spatio-Temporal Graph Convolution and its application in Complex Activity Recognition
Authors	Pratyusha Das, Antonio Ortega, University of Southern California, United States
Session	MLSP-17: Graph Neural Networks
Location	Gather.Town
Session Time:	Wednesday, 09 June, 14:00 - 14:45
Presentation Time:	Wednesday, 09 June, 14:00 - 14:45
Presentation	Poster
Topic	Machine Learning for Signal Processing: [MLR-DEEP] Deep learning techniques
IEEE Xplore Open Preview	Click here to view in IEEE Xplore
Virtual Presentation	Click here to watch in the Virtual Conference
Abstract	Understanding complex hand actions from hand skeleton data is an important yet challenging task. In this paper, we analyze hand skeleton-based complex activities by modeling dynamic hand skeletons through a spatio-temporal graph convolutional network (ST-GCN). This model jointly learns and extracts Spatio-temporal features for activity recognition. Our proposed technique, Symmetric Sub-graph spatio-temporal graph convolutional neural network (S^2-ST-GCN), exploits the symmetric nature of hand graphs to decompose them into sub-graphs, which allow us to build a separate temporal model for the relative motion of the fingers. This subgraph approach can be implemented efficiently by preprocessing input data using a Haar unit based orthogonal matrix. Then, in addition to spatial filters, separate temporal filters can be learned for each sub-graph. We evaluate the performance of the proposed method on the First-Person Hand Action dataset. While the proposed method shows comparable performance with the state of the art methods in train:test=1:1 setting, it achieves this with greater stability. Furthermore, we demonstrate significant performance improvement in comparison to state of the art methods in the cross-person setting. S^2-ST-GCN also outperforms a finger-based decomposition of the hand graph where no preprocessing is applied.