2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information
Login Paper Search My Schedule Paper Index Help

My ICASSP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper IDSPE-48.3
Paper Title IMPROVED DATA SELECTION FOR DOMAIN ADAPTATION IN ASR
Authors Shannon Wotherspoon, William Hartmann, Matthew Snover, Owen Kimball, Raytheon BBN, United States
SessionSPE-48: Speech Recognition 18: Low Resource ASR
LocationGather.Town
Session Time:Friday, 11 June, 11:30 - 12:15
Presentation Time:Friday, 11 June, 11:30 - 12:15
Presentation Poster
Topic Speech Processing: [SPE-GASR] General Topics in Speech Recognition
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Abstract Automatic speech recognition (ASR) systems are highly sensitive to train-test domain mismatch. However, because transcription is often prohibitively expensive, it is important to be able to make use of available transcribed out-of-domain data. We address the problem of domain adaptation with semi-supervised training (SST). Contrary to work in in-domain SST, we find significant performance improvement even with just one hour of target-domain data---though, the selection of the data is critical. We show that minimum phone error rate is a good oracle measure for selection, and we approximate this measure by using the average phone confidence of an utterance. With larger domain shifts, we also find that deletions and low lexical diversity are a serious issue, which we address by incorporating phone rate into our selection metric. With our proposed selection criterion, we see up to 57% relative improvements over the out-of-domain baseline model. Furthermore, this selection method generalizes well, and matches or outperforms word-level confidence selection across six separate domain shift conditions.