2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information
Login Paper Search My Schedule Paper Index Help

My ICASSP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper IDSPE-9.4
Paper Title RECENT DEVELOPMENTS ON ESPNET TOOLKIT BOOSTED BY CONFORMER
Authors Pengcheng Guo, Northwestern Polytechnical University; Johns Hopkins University, China; Florian Boyer, LaBRI, University of Bordeaux; Airudit, France; Xuankai Chang, Johns Hopkins University, United States; Tomoki Hayashi, Nagoya University; Human Dataware Lab. Co., Ltd., Japan; Yosuke Higuchi, Waseda University, Japan; Hirofumi Inaguma, Kyoto University, Japan; Naoyuki Kamo, NTT Corporation, Japan; Chenda Li, Shanghai Jiao Tong University, China; Daniel Garcia-Romero, Jiatong Shi, Johns Hopkins University, United States; Jing Shi, Institute of Automation, Chinese Academy of Sciences, China and Johns Hopkins University, United States; Shinji Watanabe, Johns Hopkins University,, United States; Kun Wei, Northwestern Polytechnical University, China; Wangyou Zhang, Shanghai Jiao Tong University, China; Yuekai Zhang, Johns Hopkins University, United States
SessionSPE-9: Speech Recognition 3: Transformer Models 1
LocationGather.Town
Session Time:Tuesday, 08 June, 16:30 - 17:15
Presentation Time:Tuesday, 08 June, 16:30 - 17:15
Presentation Poster
Topic Speech Processing: [SPE-LVCR] Large Vocabulary Continuous Recognition/Search
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Abstract In this study, we present recent developments on ESPnet: End-to-End Speech Processing toolkit, which mainly involves a recently proposed architecture called Conformer, Convolution-augmented Transformer. This paper shows the results for a wide range of end-to-end speech processing applications, such as automatic speech recognition (ASR), speech translations (ST), speech separation (SS) and text-to-speech (TTS). Our experiments reveal various training tips and significant performance benefits obtained with the Conformer on different tasks. These results are competitive or even outperform the current state-of-art Transformer models. We are preparing to release all-in-one recipes using open source and publicly available corpora for all the above tasks with pre-trained models. Our aim for this work is to contribute to our research community by reducing the burden of preparing state-of-the-art research environments usually requiring high resources.