2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDSPE-53.5
Paper Title A COMPARISON STUDY ON INFANT-PARENT VOICE DIARIZATION
Authors Junzhe Zhu, Mark Hasegawa-Johnson, Nancy McElwain, University of Illinois at Urbana-Champaign, United States
SessionSPE-53: Speaker Diarization
LocationGather.Town
Session Time:Friday, 11 June, 13:00 - 13:45
Presentation Time:Friday, 11 June, 13:00 - 13:45
Presentation Poster
Topic Speech Processing: [SPE-SPKR] Speaker Recognition and Characterization
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract We design a framework for studying prelinguistic child voice from 3 to 24 months based on state-of-the-art algorithms in diarization. Our system consists of a time-invariant feature extractor, a context-dependent embedding generator, and a classifier. We study the effect of swapping out different components of the system, as well as changing loss function, to find the best performance. We also present a multiple-instance learning technique that allows us to pre-train our parameters on larger datasets with coarser segment boundary labels. We found that our best system achieved 43.8% DER on test dataset, compared to 55.4% DER achieved by LENA software. We also found that using convolutional feature extractor instead of logmel features significantly increases the performance of neural diarization.