2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDSPE-42.1
Paper Title A STAGE MATCH FOR QUERY-BY-EXAMPLE SPOKEN TERM DETECTION BASED ON STRUCTURE INFORMATION OF QUERY
Authors Junyao Zhan, Qianhua He, Jianbin Su, Yanxiong Li, South China University of Technology, China
SessionSPE-42: Keyword Spotting
LocationGather.Town
Session Time:Thursday, 10 June, 15:30 - 16:15
Presentation Time:Thursday, 10 June, 15:30 - 16:15
Presentation Poster
Topic Speech Processing: [SPE-GASR] General Topics in Speech Recognition
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract The state-of-the-art of query-by-example spoken term detection (QbE-STD) strategies are usually based on segmental dynamic time warping (S-DTW). However, the sliding window in S-DTW may separate signal of a word into different segments and produce many illegal candidates required to be compared with the query, which significantly reduce the accuracy and efficiency of detection. In this paper, we propose a stage match strategy based on the structure information of the query, represented with the unvoiced-voiced attribute of the portions in itself. The strategy first locates potential candidates with similar structure against the query in utterances, and further matches the query with Type-Location DTW (TL-DTW), which is a modified DTW with the constraints of pronunciation types and relative positions of paired frames in the voiced sub-segments. Experiments on AISHELL-1 Corpus showed that the proposed approach achieved a relative improvement of 30.51% in AUC against S-DTW and speeded up the retrieval.