2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

IEEE Signal Processing Society

Institute of Electrical and Electronics Engineers (IEEE)

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper ID	AUD-23.6
Paper Title	FAST THRESHOLD OPTIMIZATION FOR MULTI-LABEL AUDIO TAGGING USING SURROGATE GRADIENT LEARNING
Authors	Thomas Pellegrini, Université de Toulouse III ; IRIT, France; Timothée Masquelier, CERCO UMR 5549, CNRS ; Université de Toulouse III, France
Session	AUD-23: Detection and Classification of Acoustic Scenes and Events 4: Datasets and metrics
Location	Gather.Town
Session Time:	Thursday, 10 June, 15:30 - 16:15
Presentation Time:	Thursday, 10 June, 15:30 - 16:15
Presentation	Poster
Topic	Audio and Acoustic Signal Processing: [AUD-CLAS] Detection and Classification of Acoustic Scenes and Events
IEEE Xplore Open Preview	Click here to view in IEEE Xplore
Virtual Presentation	Click here to watch in the Virtual Conference
Abstract	Multi-label audio tagging consists of assigning sets of tags to audio recordings. At inference time, thresholds are applied on the confidence scores outputted by a probabilistic classifier, in order to decide which classes are detected active. In this work, we consider having at disposal a trained classifier and we seek to automatically optimize the decision thresholds according to a performance metric of interest, in our case F-measure (micro-F1). We propose a new method, called SGLThresh for Surrogate Gradient Learning of Thresholds, that makes use of gradient descent. Since F1 is not differentiable, we propose to approximate the thresholding operation gradients with the gradients of a sigmoid function. We report experiments on three datasets, using state-of-the-art pre-trained deep neural networks. In all cases, SGLThresh outperformed three other approaches: a default threshold value (defThresh), an heuristic search algorithm and a method estimating F1 gradients numerically. It reached 54.9% F1 on AudioSet eval, compared to 50.7% with defThresh. SGLThresh is very fast and scalable to a large number of target tags.