2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information
Login Paper Search My Schedule Paper Index Help

My ICASSP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper IDSPE-43.2
Authors Takashi Fukuda, Gakuto Kurata, IBM Research AI, Japan
SessionSPE-43: Speech Recognition 15: Robust Speech Recognition 1
Session Time:Thursday, 10 June, 16:30 - 17:15
Presentation Time:Thursday, 10 June, 16:30 - 17:15
Presentation Poster
Topic Speech Processing: [SPE-ROBU] Robust Speech Recognition
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Abstract This paper proposes an improved generalized knowledge distillation framework with multiple dissimilar teacher networks, each of which is specialized for a specific domain, to make a deployable student network more robust to challenging acoustic environments. In this paper, we first address a method to partition the training data for constructing ensembles of the teachers from unsupervised neural clustering with features based on context-dependent phonemes representing each acoustic domain. Second, we illustrate how a single student network is effectively trained with multiple specialized teachers designed from partitioned data. During the training step, the weights of the student network are updated using a composite two-part cross entropy loss obtained from a pair consisting of a specialized teacher corresponding to input speech and a generalized teacher trained with a balanced data set. Unlike system combination methods, we aim to incorporate the benefits from multiple models into a single student network via knowledge distillation that does not increase any computational costs during the decoding time. The improvement of the proposed technique is shown on acoustically diverse signals contaminated by challenging practical noises.