Paper ID | IVMSP-26.2 | ||
Paper Title | ATTENTION-GUIDED SECOND-ORDER POOLING CONVOLUTIONAL NETWORKS | ||
Authors | Shannan Chen, Dalian University, China; Qiule Sun, Dalian University of Technology, China; Cunhua Li, Jiangsu Ocean University, China; Jianxin Zhang, Dalian Minzu University, China; Qiang Zhang, Dalian University of Technology, China | ||
Session | IVMSP-26: Attention for Vision | ||
Location | Gather.Town | ||
Session Time: | Thursday, 10 June, 16:30 - 17:15 | ||
Presentation Time: | Thursday, 10 June, 16:30 - 17:15 | ||
Presentation | Poster | ||
Topic | Image, Video, and Multidimensional Signal Processing: [IVSMR] Image & Video Sensing, Modeling, and Representation | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | Recently, channel attention-guided convolutional networks (ConvNets) have shown great advance on visual recognition tasks. However, they mainly exploit coarse first-order statistics to characterize holistic image and rarely focus on long-range feature dependencies, which limits the representation power in a certain. To handle above limitations, this paper proposes a novel attention-guided second-order pooling convolutional network (ASP-Net). ASP-Net introduces bilinear pooling that captures pairwise feature interactions to model second-order statistics. Meanwhile, it explicitly collects long-range dependencies via non-local operations, thus providing a global view in lower layers. Then, the second-order statistics and non-local context features are fused to obtain the enhanced representation for predicting channel-wise attention map and scaling convolution features. Experiment results on three commonly used datasets illuminate that ASP-Net outperforms its counterparts and achieves competitive performance. The source code is available at https://github.com/ShannanChen/ASPNet. |