2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDBIO-10.1
Paper Title HIERARCHICAL ATTENTION-BASED TEMPORAL CONVOLUTIONAL NETWORKS FOR EEG-BASED EMOTION RECOGNITION
Authors Chao Li, Boyang Chen, Ziping Zhao, Tianjin Normal University, China; Nicholas Cummins, King’s College London, United Kingdom; Björn Schuller, University of Augsburg, Germany
SessionBIO-10: Deep Learning for EEG Analysis
LocationGather.Town
Session Time:Thursday, 10 June, 13:00 - 13:45
Presentation Time:Thursday, 10 June, 13:00 - 13:45
Presentation Poster
Topic Biomedical Imaging and Signal Processing: [BIO] Biomedical signal processing
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract EEG-based emotion recognition is an effective way to infer the inner emotional state of human beings. Recently, deep learning methods, particularly long short-term memory recurrent neural networks (LSTM-RNNs), have made encouraging progress for in the field of emotion recognition. However, the LSTM-RNNs are time-consuming and have difficulty avoiding the problem of exploding/vanishing gradients when during training. In addition, EEG-based emotion recognition often suffers due to the existence of silent and emotional irrelevant frames from intra-channel. Not all channels carry the same emotional discriminative information. In order to tackle these problems, a hierarchical attention-based temporal convolutional networks (HATCN) for efficient EEG-based emotion recognition is proposed. Firstly, a spectrogram representation is generated from raw EEG signals in each channel to capture their time and frequency information. Secondly, temporal convolutional networks (TCNs) are utilised to automatically learn more robust/intrinsic long-term dynamic characters in emotion response. Next, a hierarchical attention mechanism is investigated that aggregates the emotional information at both the frame and channel level. The experimental results on the DEAP dataset show that our method achieves an average recognition accuracy of 0.716 and an F1-score of 0.642 over four emotional dimensions and outperforms other state-of-the-art methods in a user-independent scenario.