2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information
Login Paper Search My Schedule Paper Index Help

My ICASSP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper IDSPE-45.5
Paper Title ENCODER-DECODER BASED PITCH TRACKING AND JOINT MODEL TRAINING FOR MANDARIN TONE CLASSIFICATION
Authors Hao Huang, Kai Wang, Ying Hu, Xinjiang University, China; Sheng Li, National Institute of Information and Communications Technology, Japan
SessionSPE-45: Speech Analysis
LocationGather.Town
Session Time:Thursday, 10 June, 16:30 - 17:15
Presentation Time:Thursday, 10 June, 16:30 - 17:15
Presentation Poster
Topic Speech Processing: [SPE-ANLS] Speech Analysis
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Abstract We pursue an interpretable pitch tracking model and a jointly trained tone model for Mandarin tone classification. For pitch tracking, present deep learning based pitch model structure seldom considers the Viterbi decoding commonly implemented in prevalent manually designed pitch tracking algorithms. We propose RNN based Encoder-Decoder framework with gating mechanism which underlying models both the state cost estimation and Viterbi back-tracing pass implemented in the RAPT algorithm. Then we apply the pitch extractor to a down-stream Mandarin tone classification task. The basic motivation is to combine together the two conventional components in tone classification (i.e., the pitch extractor and tone classifier) and then the whole network are trained simultaneously in an end-to-end fashion. Various cascade methods are evaluated. We carry out pitch extraction and tone classification experiments on Mandarin continuous speech database to show the superiority of the proposed models. Experimental results on pitch extraction show proposed pitch tracking model outperforms the DNN-RNN and bi-directional variants. Tone classification experimental results show the composite model outperforms the traditional cascade tone classification framework which makes use of pitch related feature and a back-end classifier.