Paper ID | IVMSP-7.5 |
Paper Title |
MULTI-ORDER ADVERSARIAL REPRESENTATION LEARNING FOR COMPOSED QUERY IMAGE RETRIEVAL |
Authors |
Zhixiao Fu, Zhejiang University, China; Xinyuan Chen, East China Normal University, China; Jianfeng Dong, Zhejiang Gongshang University, China; Shouling Ji, Zhejiang University, China |
Session | IVMSP-7: Machine Learning for Image Processing I |
Location | Gather.Town |
Session Time: | Wednesday, 09 June, 13:00 - 13:45 |
Presentation Time: | Wednesday, 09 June, 13:00 - 13:45 |
Presentation |
Poster
|
Topic |
Image, Video, and Multidimensional Signal Processing: [IVSMR] Image & Video Sensing, Modeling, and Representation |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
This paper targets at a task of composed query image retrieval. Given a composed query consists of a reference image and modification text, the task aims to retrieve images which are generally similar to the reference image but differ according to the given modification text. The task is challenging, due to the complexity of the composed query and cross-modality characteristics between the query and candidate images. The common paradigm for the task is to first obtain fused feature of the reference image and the text, and further project them into a common embedding space with candidate images.However, the majority of works usually only aim for the representation of high level, ignoring the low-level representation which may be complementary to the high-level representation. So this paper proposes a new Multi-order Adversarial Network (MAN) which uses multi-level representations and simultaneously explores their low-order and high-order interactions, obtaining low-order and high-order features. The low-order features reflect the pattern of itself and high-order features contains the interaction between features. Moreover, we further introduce an adversarial module to constrain the fusion of the reference image and the text. Extensive experiments on three datasets verify the effectiveness of our MAN and also demonstrate its state-of-the-art performance. |