IEEE ICASSP 2021 || Toronto, Ontario, Canada || 6-11 June 2021

My ICASSP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.

Create a login based on your email (takes less than one minute)
Perform 'Paper Search'
Select papers that you desire to save in your personalized schedule
Click on 'My Schedule' to see the current list of selected papers
Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper ID

MMSP-8.4

Paper Title

BIDIRECTIONAL FOCUSED SEMANTIC ALIGNMENT ATTENTION NETWORK FOR CROSS-MODAL RETRIEVAL

Authors

Shuli Cheng, Liejun Wang, Anyu Du, Yongming Li, Xinjiang University, China

Session

MMSP-8: Multimedia Retrieval and Signal Detection

Location

Gather.Town

Session Time:

Friday, 11 June, 13:00 - 13:45

Presentation Time:

Friday, 11 June, 13:00 - 13:45

Presentation

Poster

Topic

Multimedia Signal Processing: Multimedia Applications

IEEE Xplore Open Preview

Click here to view in IEEE Xplore

Abstract

Cross-modal retrieval is a very challenging and significant task in intelligent understanding. Researchers have tried to capture modal semantic information through a weighted attention mechanism. Still, they cannot eliminate irrelevant semantic information's negative effects and cannot capture fine-grained modal semantic information. In order to further accurately capture the multi-modal semantic information, a bidirectional focused semantic alignment attention network (BFSAAN) is proposed to handle cross-modal retrieval tasks. Core ideas of BFSAAN are as follows: 1) Bidirectional focused attention mechanism is adopted to share modal semantic information, further eliminating the negative influence of irrelevant semantic information. 2) Strip pooling is applied to image and text modalities, a lightweight spatial attention mechanism to capture modal spatial semantic information. 3) Second-order covariance pooling is explored to obtain multi-modal semantic representation, capturing modal channel semantic information and achieving semantic alignment between image-text modalities. The experiment is executed in two standard cross-modal retrieval datasets (Flickr30K and MS COCO). The experimental design includes four aspects: performance comparison, ablation analysis, algorithm convergence, and visual analysis. Experimental results show that BFSAAN has better cross-modal retrieval performance.

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

My ICASSP 2021 Schedule

Paper Detail