2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDIVMSP-22.3
Paper Title HIERARCHICAL CONTEXT GUIDED AGGREGATION NETWORK FOR STEREO MATCHING
Authors Jun Peng, Wangduo Xie, Zijing Huang, Wei Chen, Yong Zhao, Shenzhen Graduate School of Peking University, China
SessionIVMSP-22: Image & Video Sensing, Modeling and Representation
LocationGather.Town
Session Time:Thursday, 10 June, 14:00 - 14:45
Presentation Time:Thursday, 10 June, 14:00 - 14:45
Presentation Poster
Topic Image, Video, and Multidimensional Signal Processing: [IVARS] Image & Video Analysis, Synthesis, and Retrieval
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract Nowadays, CNN-based stereo matching methods achieved remarkable performance, and how to efficiently exploit contextual information in cost aggregation stage is the key to improve performance. In this paper, we propose a simple yet efficient network named Hierarchical Context Guided Aggregation Network (HCGANet). Specifically, a novel cost aggregation module is developed to replace widely used 3D convolutions. Firstly, we construct pyramid cost volumes which carry multi-level distinctive and discriminative representation. Additionally, an intra-level aggregation module is presented for single-level regularization and contextual information learning. Moreover, we develop an inter-level aggregation module to hierarchically regularize cost volumes via the guidance from coarser scales. The proposed aggregation module is lightweight and complementary, further improving the robustness and performance of disparity estimation. Extensive experiments demonstrate that the proposed method achieves superior results for both efficiency and accuracy on SceneFlow and KITTI benchmarks.