Paper ID | AUD-31.2 | ||
Paper Title | A NEW DCASE 2017 RARE SOUND EVENT DETECTION BENCHMARK UNDER EQUAL TRAINING DATA: CRNN WITH MULTI-WIDTH KERNELS | ||
Authors | Jan Baumann, Patrick Meyer, Timo Lohrenz, Technische Universität Braunschweig, Germany; Alexander Roy, Michael Papendieck, IAV GmbH, Germany; Tim Fingscheidt, Technische Universität Braunschweig, Germany | ||
Session | AUD-31: Detection and Classification of Acoustic Scenes and Events 6: Events | ||
Location | Gather.Town | ||
Session Time: | Friday, 11 June, 13:00 - 13:45 | ||
Presentation Time: | Friday, 11 June, 13:00 - 13:45 | ||
Presentation | Poster | ||
Topic | Audio and Acoustic Signal Processing: [AUD-CLAS] Detection and Classification of Acoustic Scenes and Events | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | Rare sound event detection (rare SED) deals with obtaining valuable information from data consisting mostly of acoustic background noises. It has meanwhile a long research history and was part of the DCASE 2017 Challenge. State-of-the-art performance is currently reached using a stacked combination of a CNN and an RNN, dubbed CRNN, which was also successfully applied in other domains such as in hybrid automatic speech recognition. In this work, we propose a new CRNN model for rare SED. This new model uses a set of parallel convolutions with multiple kernel widths in the CRNN and is based on an extended feature representation of the log-mel spectrogram. Furthermore, we apply and optimize different evaluation postprocessing methods and analyze the modifications in an ablation study. The proposed model outperforms the so-far top-scoring networks of the DCASE Challenge - using the same training material for all methods - by an error rate of 6.13% absolute and by 4.39% absolute in the F1 score on the test set and under these conditions achieves a new benchmark result on the DCASE 2017 Rare SED data set. |