Advancing Technological Equity in Speech and Language Processing
Accelerating advances in AI and deep neural networks have powered the proliferation of speech and language technologies in applications such as virtual assistants, smart speakers, reading machines, etc. The technologies have performed impressively well, achieving human parity in speech recognition accuracies and speech synthesis naturalness. As these technologies continue to permeate our daily lives, they need to support diverse users and usage contexts with inputs that deviate from the mainstream. Examples include non-native speakers, code-switching, speech carrying myriad emotions and styles, and speakers with impairments and disorders. Under such contexts, existing technologies often suffer performance degradations and fail to fulfill the needs of the users. The crux of the problem lies in data scarcity and data sparsity, which are exacerbated by high data variability.
This talk presents an overview of some of the approaches we have used to address the challenges of data shortage, positioned at various stages along the processing pipeline. They include: data augmentation based on speech signal perturbations, use of pre-trained representations, learning speech representation disentanglement, knowledge distillation architectures, meta-learned model re-initialization, as well as adversarially trained models. The effectiveness of these approaches are demonstrated through a variety of applications, including accented speech recognition, dysarthric speech recognition, code-switched speech synthesis, disordered speech reconstruction, one-shot voice conversion and exemplar-based emotive speech synthesis. These efforts strive to develop speech and language technologies that can gracefully adapt and accommodate a diversity of user needs and usage contexts, in order to achieve technological equity in our society.
Helen Meng is Patrick Huen Wing Ming Professor of the Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong. She received the S.B., S.M. and Ph.D. degrees, all in Electrical Engineering from MIT. She joined The Chinese University of Hong Kong in 1998, and established the Human-Computer Communications Laboratory in her department in 1999. In 2005, she founded the Microsoft-CUHK Joint Laboratory for Human-Centric Computing and Interface Technologies and serves as Director. This laboratory has been recognized as a Ministry of Education (MoE) of China Key Laboratory since 2008. In 2006, she founded the Tsinghua-CUHK Joint Research Centre for Media Sciences, Technologies and Systems. In 2007, she helped establish the Laboratory for Ambient Intelligence and Multimodal Systems in the Chinese Academy of Sciences Shenzhen Institute of Advanced Technology, through its joint initiative with CUHK. In 2013, she established the CUHK Stanley Ho Big Data Decision Analytics Research Center and serves as its Founding Director. Helen served as Associate Dean (Research) of the Faculty of Engineering between 2006 and 2010, as well as Department Chairman between 2012 and 2018.
Professor Meng is a recognized scholar in the field of multilingual speech and language processing, multimodal human-computer interaction and Big Data analytics. She leads the interdisciplinary research team that received the first Theme-based Research Scheme Project in Artificial Intelligence in 2019. She is a Fellow of the IEEE, elected in 2013 “for contributions to spoken language and multimodal systems”; a Fellow of the International Speech Communication Association (one of 62 worldwide, 8 from Asia), elected in 2016 “for contributions to multilingual, multimodal human-computer interaction and language learning technologies”; and a Fellow of the Hong Kong Institution of Engineers (HKIE) and Hong Kong Computer Society (HKCS). Her recent awards include 2018 CogInfoComm Best Paper Award, 2017 Outstanding Women Professional Award (one of 20 since 1999), 2016 Microsoft Research Outstanding Collaborator Award (one of 32 academics worldwide), 2016 IBM Faculty Award, 2016 IEEE ICME Best Paper Award, 2015 ISCA Distinguished Lecturer, 2015 HKCS inaugural Outstanding ICT Women Professional Award and 2012 Asia-Pacific Signal and Information Processing Association (APSIPA) inaugural Distinguished Lecturer.
Professor Meng devotes much effort towards professional services both regionally and internationally. She was elected Editor-in-Chief (2009-2011) of the IEEE Signal Processing Society’s (SPS) Transaction on Audio, Speech and Language Processing (often regarded as the most prestigious journal in the field), IEEE SPS Board of Governors (2014-2016), and currently serves on the Nominations and Appointments Committee (2017-2018). Helen was also an elected member of the International Speech Communication Association (ISCA) Board (2007-2015) and current serves on its International Advisory Council (2017-2020). She was Technical Chair of ISCA’s flagship conference INTERSPEECH 2014 and will be General Chair of INTERSPEECH 2020. She is the recipient of the 2019 IEEE Leo L. Beranek Meritorious Service Award for “for exemplary service to and leadership in the Signal Processing Society”.