Paper ID | IVMSP-26.2 |
Paper Title |
ATTENTION-GUIDED SECOND-ORDER POOLING CONVOLUTIONAL NETWORKS |
Authors |
Shannan Chen, Dalian University, China; Qiule Sun, Dalian University of Technology, China; Cunhua Li, Jiangsu Ocean University, China; Jianxin Zhang, Dalian Minzu University, China; Qiang Zhang, Dalian University of Technology, China |
Session | IVMSP-26: Attention for Vision |
Location | Gather.Town |
Session Time: | Thursday, 10 June, 16:30 - 17:15 |
Presentation Time: | Thursday, 10 June, 16:30 - 17:15 |
Presentation |
Poster
|
Topic |
Image, Video, and Multidimensional Signal Processing: [IVSMR] Image & Video Sensing, Modeling, and Representation |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
Recently, channel attention-guided convolutional networks (ConvNets) have shown great advance on visual recognition tasks. However, they mainly exploit coarse first-order statistics to characterize holistic image and rarely focus on long-range feature dependencies, which limits the representation power in a certain. To handle above limitations, this paper proposes a novel attention-guided second-order pooling convolutional network (ASP-Net). ASP-Net introduces bilinear pooling that captures pairwise feature interactions to model second-order statistics. Meanwhile, it explicitly collects long-range dependencies via non-local operations, thus providing a global view in lower layers. Then, the second-order statistics and non-local context features are fused to obtain the enhanced representation for predicting channel-wise attention map and scaling convolution features. Experiment results on three commonly used datasets illuminate that ASP-Net outperforms its counterparts and achieves competitive performance. The source code is available at https://github.com/ShannanChen/ASPNet. |