2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Tutorials

Tutorial Details

1 - Reconfigurable Intelligent Surfaces for Future Wireless Communications

Alessio Zappone, Marco Di Renzo, Shi Jin, and Merouane Debbah

Description: The tutorial will first discuss the most recent standardization activities of 5G networks and then will explain why 5G technologies will not be able to keep the pace for long with the exponential increase of data connectivity demands by end-users, thus motivating the need to go beyond 5G technologies. Then, the fundamentals of RIS will be introduced and the latest results regarding the modeling and design of RIS-based wireless networks, will be presented. Propagation channel models and network deployment approaches for RIS-based systems will be discussed, and the most performing algorithms for the optimization of the radio resources of RIS-based networks will be explained, including optimal power allocation, beamforming strategies, and RIS configuration. The tutorial will also present the latest experimental results of RIS-based wireless communications. In particular, the validation activities conducted using the RIS-based hardware simulation platform developed by Prof. Shi Jin’s group at SouthEast university will be discussed. Finally, the tutorial will summarize the key points and will highlight the most relevant research directions and open problems to be investigated towards the development of beyond 5G RIS-based wireless networks.

2 - DeepFake Generation and Detection

Siwei Lyu

Description: Images, videos, and audios that are created or manipulated by AI algorithms, in particular, deep neural networks (DNNs), are a recent twist to the disconcerting problem of online disinformation. The AI-based fake contents, hereafter referred to as the DeepFakes, range from realistic images generated or edited with the generative adversarial network (GAN) models, to face-swapping videos created with auto-encoder network models (the origin of the namesake), and indistinguishable human voices created with recursive neural network models. 

 The escalated concerns over the potential impacts of the DeepFakes have spawned rapid developments on the detection of DeepFakes in recent years, with promising performance reported on large-scale evaluation datasets. This tutorial will cover the fundamentals in the generation, detection, and other counter-technologies of DeepFakes and also provide the audience a comprehensive overview of the state-of-the-arts in these areas.

3 - Assessment of Visual Signals: From Technical Quality to Aesthetic Quality

Leida Li and Weisi Lin

Description: Image and video data account for more than 80% of big data, and the proportion is still rising. Accompanying the explosive growth of visual data, quality assessment becomes increasingly important, which has extensive applications in image/video processing, imaging system design and optimization, and smart photography. Visual quality assessment can be categorized into technical quality assessment (TQA) and aesthetic quality assessment (AQA). TQA mainly evaluates the visual distortions, including noise, sharpness, contrast change, etc. By contrast, AQA focuses on a higher level of aesthetic factors, including rule of thirds, depth of field, color harmony, etc. During recent years, both TQA and AQA have attracted wide interests. In this tutorial, we will first give a comprehensive and up-to-date review of the recent advances on image/video TQA, together with a discussion of its applications. Then, we introduce recent research progresses on AQA, including generic image aesthetic assessment (GIAA) and personalized IAA (PIAA). Especially for PIAA, we discuss two issues particularly significant to it: rating distribution prediction and personality-assisted aesthetic assessment. We will also discuss some emerging topics on TQA/AQA, including the generalization capability of quality models, the relation between TQA and AQA, as well as aesthetics-assisted image editing.

4 - Graph signal processing for machine learning: A review and new perspectives

Xiaowen Dong, Dorina Thanou, Laura Toni, Michael Bronstein, and Pascal Frossard

Description: The effective representation, processing, analysis, and visualization of large-scale structured data, especially those related to complex domains such as networks and graphs, are one of the key questions in modern machine learning. Graph signal processing (GSP), a vibrant branch of signal processing models and algorithms that aims at handling data supported on graphs, opens new paths of research to address this challenge. In this tutorial, we review a few important contributions made by GSP concepts and tools, such as graph filters and transforms, to the development of novel machine learning algorithms. In particular, our discussion focuses on the following three aspects: exploiting data structure and relational priors, improving data and computational efficiency, and enhancing model interpretability. Furthermore, we provide new perspectives on future development of GSP techniques that may serve as a bridge between applied mathematics and signal processing on one side, and machine learning and network science on the other. Cross-fertilization across these different disciplines may help unlock the numerous challenges of complex data analysis in the modern age.

5 - Audio-Visual Speech Enhancement and Separation Based on Deep Learning

Daniel Michelsanti, Zheng-Hua Tan, Jesper Jensen, and Dong Yu

Description: During a conversation, humans use both sight and hearing in order to focus on the speaker of interest . Despite this evidence, traditional speech enhancement and separation algorithms rely only on acoustic speech signals. Although the advances in deep learning allowed these algorithms to reach high performance, speech enhancement and separation systems still struggle in situations where the background noise level is high, limited by the use of a single modality. Therefore, recent works investigated the possibility of including visual information from the speaker of interest to perform speech enhancement and separation. In this tutorial, we will provide an overview of deep-learning-based techniques used for audio-visual speech enhancement and separation. Specifically, we will consider how the field evolved from the first single-microphone speaker-dependent systems to the current state of the art. In addition, several demos developed to showcase our research in the field will be shown. The tutorial is intended to highlight the potential of this emergent research topic with two aims: helping beginners to navigate through the large number of approaches in the literature; inspiring experts by providing insights and perspectives on current challenges and possible future research directions.

6 - Localization-of-Things: New Opportunities in Signal Processing

Moe Z. Win and Andrea Conti

Description: The availability of real-time high-accuracy location awareness is essential for current and future wireless applications, particularly those involving Internet-of-Things and 5G communication networks. Reliable localization and navigation of people, objects, and vehicles – Localization-of-Things – is a critical component for a diverse set of applications including connected communities, smart environments, vehicle autonomy, asset tracking, medical services, military systems, and crowd sensing. The coming years will see the emergence of network localization and navigation in challenging environments with sub-meter accuracy and minimal infrastructure requirements. We will discuss the limitations of traditional positioning, and move on to the key enablers for high-accuracy location awareness: wideband transmission and cooperative processing. Topics covered will include: fundamental bounds, cooperative algorithms, and network experimentation. Fundamental bounds serve as performance benchmarks, and as a tool for network design. Cooperative algorithms are a way to achieve dramatic performance improvements compared to traditional non-cooperative positioning. To harness these benefits, system designers must consider realistic operational settings; thus, we present the performance of cooperative localization based on measurement campaigns.

7 - GPU-Acceleration of Signal Processing Workflows from Python

Adam Thompson

Description: In this tutorial, we will introduce developers and users alike to the cuSignal API and demonstrate performance in both an online and offline signal processing workflow. We will also demonstrate how to connect cuSignal to the PyTorch deep learning framework to begin deep learning training and inferencing tasks without data leaving the GPU. Further, we will devote a significant amount of time to teaching attendees how to build their own GPU kernels within Python. We will provide examples and best practices on how to transition from standard Python code to fast Numba CUDA kernels, how to profile the result, and how to then implement custom CuPy CUDA kernels for optimum performance. Throughout the tutorial, we will discuss cost-benefit tradeoffs including the developer learning curve and anticipated performance. Our goal of the tutorial is to demonstrate the ease and flexibility of creating and implementing GPU-based high-performance signal processing workloads from Python.

8 - Meta Learning and its applications to Human Language Processing

Hung-yi Lee, Ngoc Thang Vu, and Shang-Wen Li

Description: Deep learning based human language technology (HLT) has become the mainstream of research in recent years and significantly outperforms conventional methods. However, deep learning models are notorious for being data and computation hungry. These downsides limit the application of such models from deployment to different languages, domains, or styles, since collecting in-genre data and training model from scratch are costly, and the long-tail nature of human language makes challenges even greater. A typical machine learning algorithm, e.g., deep learning, can be considered as a sophisticated function. The function takes training data as input and a trained model as output. Today the learning algorithms are mostly human-designed and need a large amount of labeled training data to learn. One possible method which could potentially overcome these challenges is Meta Learning, also known as ‘Learning to Learn’ that aims at learning the learning algorithm, including better parameter initialization, optimization strategy, network architecture, distance metrics, and beyond. In several HLT areas, Meta Learning has been shown high potential to allow faster fine-tuning, converge to better performance, and achieve few-shot learning. The goal of this tutorial is to introduce Meta Learning approaches and review the work applying this technology to HLT.

9 - Model-based deep learning in signal processing

Yonina C. Eldar, Demba Ba, Bahareh Tolooshams, and Nir Shlezinger

Description: Over the past several years, deep learning, or more generally artificial intelligence, has spurred overwhelming research interest and attracted unprecedented attention leading to systems with far better performance than previous methods in areas such as computer vision, speech processing, and more. Standard deep learning techniques rely on vast training data to tune the weights of large general-purpose deep networks. These networks are context-agnostic, and inherit their power from the extent of the training data. On the other hand, signal processing and communication algorithms are typically derived based on adequate models of the system, data acquisition and more. When applying AI to problems in communications, array processing, and more general signal processing, it is therefore natural to seek deep learning techniques that can efficiently leverage these signal models and priors. In our labs, we have been focusing on various approaches to model-based deep networks which rely on underlying models of the system and data in order to develop deep networks tailored to the specific problem at hand. We are very interested in understanding these types of networks theoretically and also in exploring their applications to communications, radar, imaging, medical applications, microscopy and more.

10 - Distant conversational speech recognition and analysis: Recent advances, and trends towards end-to-end optimization

Keisuke Kinoshita, Yusuke Fujita, Naoyuki Kanda, and Shinji Watanabe

Description: Recognizing unsegmented conversational speech recorded with distant microphone(s) is a challenging but an essential task to be solved to unfold a myriad of new speech applications, such as a communication agent that can understand, respond to and facilitate our conversation. This task contains a number of subtasks, which has been studied rather independently for a decade, such as multichannel/single-channel source separation, speaker diarization with source number counting, and conversational speech recognition. This tutorial first revisits, with demonstration, current state-of-the-art systems for this task, which were developed for challenges such as CHiME 5-6 challenges, and commercial products. These systems typically consist of a combination of well-established independently optimized modules. While these systems are designed carefully to consolidate these independent modules, there is still a large room for improvement. In the latter part of the tutorial, we introduce a recent new research trend that aims to establish an optimal joint neural system that solves those subtasks all together, through end-to-end optimization based on common integrated objective. By showing the potential of such jointly-optimal systems that now start outperforming previous top-performing systems in many tasks, we discuss the future directions and challenges for this task from both industry and academic perspectives.

11 - Efficient Global Optimization and its Application to Wireless Interference Networks

Bho Matthiesen and Eduard A. Jorswieck

Description: Global optimization is concerned with obtaining the solution of nonconvex optimization problems. Algorithms for such problems can mostly be categorized into outer approximation algorithms and branch and bound (BB) methods. This tutorial will focus on BB methods for continuous optimization and demonstrate that they are one of the most versatile tools in global optimization theory. We take a modular approach to BB and cover the aspects of rectangular subdivision, selection, bounding, and feasibility testing, both, from a theoretical and practical perspective. The focus for the bounding part is on exploiting partial monotonicity in the problem, which leads to the novel mixed monotonic programming (MMP) framework, a generalization of classical monotonic optimization (MO). Common feasibility checks are discussed and we highlight some pitfalls that lead to slow convergence speeds. The successive incumbent transcending (SIT) scheme is introduced as a remedy and its integration with BB is discussed. A notable side effect of this SIT scheme is that it also improves numerical stability when dealing with complicated feasible sets. The theory developed in the first part of this tutorial will be applied in several case studies from the area of resource allocation for wireless interference networks.

12 - Underdetermined Wideband Direction of Arrival Estimation and Target Localization

Wei Liu

Description: In this tutorial, we will first give an introduction to the general area of wideband array signal processing including both beamforming and direction of arrival (DOA) estimation and answer a key basic question, i.e. what is wideband and how wide is wideband? Then, we briefly review some traditional narrowband approaches to DOA estimation and show how they are extended to the wideband case. Thirdly, we will focus on the underdetermined DOA estimation problem and the approaches for underdetermined wideband DOA estimation are divided into two classes: one is direct extension of the underdetermined narrowband DOA estimation approaches to the wideband case in the frequency domain; the other one is some unique approaches for the wideband case and the focus is on how to exploit frequency diversity of the signals to increase the degrees of freedom (DOFs) of the system. Theoretical analyses based on the derived Cramer-Rao bounds (CRBs) will be provided to show the increased DOFs in theory. Finally the tutorial will move to the underdetermined wideband target localization problem and the main difference compared to the DOA estimation problem is that here multiple distributed wideband arrays are employed to uniquely identify the locations of targets.

13 - Deep generative modeling of sequential data with dynamical variational autoencoders

Simon Leglaive, Xavier Alameda-Pineda, and Laurent Girin

Description: Dynamical variational autoencoders (DVAEs) combine standard variational autoencoders (VAEs) with a temporal model, in order to achieve unsupervised representation learning for sequential data. The temporal model is typically coming from the combination of traditional state-space models (SSMs) with feed-forward neural networks, or from the use of recurrent neural networks (RNNs). DVAEs can be used to process sequential data at large, leveraging the efficient training methodology of standard variational autoencoders (VAEs). The objective of this tutorial is to provide a comprehensive analysis of the DVAE-based methods that were proposed in the literature to model the dynamics between latent and observed sequential data. We will discuss the limitations of well known models (VAEs, RNNs, SSMs), the challenges of extending linear dynamical models to deep dynamical ones, and the various models that have been proposed in the machine learning and signal processing literature. Importantly we will show that we can encompass these models in a general unifying framework, from which each of the above-mentioned models can be seen as a particular instance. We will also demonstrate the use of DVAEs on real-world data, in particular for generative modeling of speech signals.

14 - Signal Processing for Mass Testing in Fighting a Pandemic: A Sampling Theory Perspective

Weiyu Xu, Ajit Rajwade, Chandra R. Murthy, and Jonathan Scarlett

Description: The COVID-19 pandemic has caused significant damage to human society. Mass testing is vital in fighting against the ongoing or any future pandemic. However, testing capacity is often limited, with shortage of testing facilities and reagents. These tests can also be slow, costly, heterogenous, and even inaccurate. In this tutorial, we view mass testing from a sampling theory perspective, and introduce recent advances in signal processing theories/methods to expand test capacity, reduce test cost, and increase test reliability. These include novel compressed sensing methods for virus testing using quantitative Polymerase Chain Reaction (qPCR) machines for increasing test throughput and reducing test cost, possibly further aided by use of family-based or contact-trace based side information; novel error correction signal processing methods to improve test reliability; and optimal allocations of testing (sampling) resources in different communities to best contain disease spread through exploration-exploitation tradeoff in testing. We will provide system modeling, algorithm designs, and performance analysis for these methods, and develop the underlying mathematical theories. We will demonstrate the impacts of these methods on real clinical applications, before introducing open research questions inspired by clinical constraints/applications, and future research directions. The audience will be exposed to state-of-the-art analytical and software tools for matrix design, decoding algorithms and analysis of group testing/compressed sensing, and sequential decisions in mass testing. This tutorial will not only demonstrate the power of signal processing methods in fighting the pandemic, but also develop novel signal processing theories and methods and introduce intellectually-inspiring fundamental research questions.

15 - Signal Processing for Vehicular Sensing and Communications Coexistence

Kumar Vijay Mishra, Nuria González-Prelcic, and Bhavani Shankar M. R.

Description: The tutorial aims to shed light on coexistence scenarios beyond those considered thus far. Building on the existing approaches, the tutorial focusses on highlighting emerging scenarios in collaborative and joint sensing and communications systems, particularly at mm-Wave frequencies and highly dynamic vehicular environments, that would benefit from information exchange between the two systems. It presents the architectures, possible methodologies for mutually beneficial co-existence as separate entities or as a joint module, and presents some recent results. The avenues discussed in the tutorial offer rich research potential while also enabling innovative “plug and play” methodologies for co-existence.

16 - Practical Massive MIMO: Performance Bottlenecks and How to Overcome Them

Chandra R. Murthy, Ribhu Chopra, Erik G. Larsson, and Himal A. Suraweera

Description: This is a three hour (half day) tutorial that discusses the causes and effects of different impairments in a practical massive MIMO system, along with techniques to mitigate them. The recent popularity of massive MIMO systems is due to the high spectral and energy efficiencies offered by them, even with simple linear signal processing techniques being employed at the BS. These advantages require the availability of accurate and up-to-date channel state information (CSI) at the BS. However, the accuracy of the CSI at the BS is compromised due to several reasons, e.g. as pilot contamination, channel aging and reciprocity imperfections in massive MIMO system and severely limit the performance of these systems. In the context of millimeter wave (mmWave) communications, the hybrid analog-digital architecture and low resolution ADCs further limit the performance. While these effects have been studied independently, in this tutorial, we aim to bring all these effects under a common umbrella. We will first discuss the modelling of the effects of these impairments. Following this, we will use signal processing tools based on blind channel estimation and tracking to present techniques for mitigating the effects of these impairments. Finally, we will describe more recent attempts at mitigating these imperfections using machine learning (ML) based methods.

17 - Wireless for Machine Learning

Carlo Fischione, Viktoria Fodor, José Mairton B. da Silva Jr., and Henrik Hellström

Description: As data generation increasingly takes place on devices without a wired connection, machine learning over wireless networks becomes critical. Many studies have shown that traditional wireless protocols are highly inefficient or unsustainable to support distributed machine learning services. This is creating the need for new wireless communication methods that will be arguably included in 6G. In this tutorial, we plan to give a comprehensive review of the state-of-the-art wireless methods that are specifically designed to support distributed machine learning services. Namely, over-the-air computation and radio resource allocation optimized for machine learning. In the over-the-air approach, multiple devices communicate simultaneously over the same time slot and frequency band to exploit the superposition property of wireless channels for gradient averaging over-the-air. In radio resource allocation optimized for machine learning, active learning metrics allow data evaluation to greatly optimize the assignment of radio resources. This tutorial introduces these methods, reviews the most important works, and highlights crucial open problems.

18 - Acoustic Environment Synthesis for XR

Zoran Cvetkovic, Enzo De Sena, and Huseyin Hacihabiboglu

Description: Simulation and rendering of environment acoustics enables to make the acoustics of a virtual space audible and is essential for providing a high level of immersion in AR/VR applications--without it, sound sources are perceived inside the head. While it is possible to simulate how sound waves physically propagate, scatter and diffract in an environment, this requires significant computational resources. In many cases, it is possible, and indeed desirable, to simplify the simulation and rendering of room acoustics by leveraging limitations of human auditory perception. This tutorial will provide an overview of the available classes of room acoustics models with a focus on models with low computational requirements that are particularly suitable for XR applications.