IEEE ICASSP 2021 || Toronto, Ontario, Canada || 6-11 June 2021

My ICASSP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.

Create a login based on your email (takes less than one minute)
Perform 'Paper Search'
Select papers that you desire to save in your personalized schedule
Click on 'My Schedule' to see the current list of selected papers
Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper ID

MMSP-2.5

Paper Title

Reinforcement Stacked Learning with Semantic-Associated Attention for Visual Question Answering

Authors

Xinyu Xiao, Tencent, China; Chunxia Zhang, School of Computer Science and Technology, Beijing Institute of Technology, China; Shiming Xiang, Chunhong Pan, Institute of Automation, Chinese Academy of Sciences, China

Session

MMSP-2: Deep Learning for Multimedia Analysis and Processing

Location

Gather.Town

Session Time:

Tuesday, 08 June, 14:00 - 14:45

Presentation Time:

Tuesday, 08 June, 14:00 - 14:45

Presentation

Poster

Topic

Multimedia Signal Processing: Emerging Areas in Multimedia

IEEE Xplore Open Preview

Click here to view in IEEE Xplore

Abstract

In essence, visual question answering (VQA) is an embedding and transformation process between two modalities of image and text. In this process, the critical problems of effectively embedding the question feature and image feature as well as transforming the features to the prediction of answer are still faithfully unresolved. In this paper, depending on these problems, a semantic-associated attention method and a reinforcement stacked learning mechanism are proposed. Firstly, within the associations of high-level semantics, a visual spatial attention model (VSA) and a multi-semantic attention model (MSA) are proposed to extract the low-level image feature and high-level semantic feature, respectively. Furthermore, we develop a reinforcement stacked learning architecture, which splits the transformation process into multiple stages, to gradually approach the answers. At each stage, a new reinforcement learning (RL) method is introduced to directly criticize inappropriate answers to optimize the model. The extensive experiments on the VQA task show that our method can achieve state-of-the-art performance.

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

My ICASSP 2021 Schedule

Paper Detail