Paper ID | MMSP-7.3 | ||
Paper Title | ECCL: EXPLICIT CORRELATION-BASED CONVOLUTION BOUNDARY LOCATOR FOR MOMENT LOCALIZATION | ||
Authors | Xinfang Liu, Shandong University, China; Xiushan Nie, Shandong Jianzhu University, China; Junya Teng, Shandong University, China; Fanchang Hao, Shandong Jianzhu University, China; Yilong Yin, Shandong University, China | ||
Session | MMSP-7: Multimodal Perception, Integration and Multisensory Fusion | ||
Location | Gather.Town | ||
Session Time: | Friday, 11 June, 13:00 - 13:45 | ||
Presentation Time: | Friday, 11 June, 13:00 - 13:45 | ||
Presentation | Poster | ||
Topic | Multimedia Signal Processing: Human Centric Multimedia | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | Moment localization in videos using natural language refers to finding the most relevant segment from the video with given a query in natural language form. In this paper, we present a new boundary-determining strategy called explicit correlation-based convolution boundary locator (ECCL), which can handle any lengths of videos and moments while leveraging fine-grained matching relationships. In this method, we first train a deep network to obtain the correlation scores between video clips and query statements. Subsequently, with the correlation scores, we utilize a convolution kernel to generate the boundary probability distribution. Finally, the start and end time indexes of the video moment are calculated with an optimization problem. Experiments on two publicly available datasets demonstrate the feasibility of ECCL. |