2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Grand Challenges

Signal Processing Grand Challenges (SPGC) Programme
2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '21)

General Chairs:

  • Konstantinos N. Plataniotis, University of Toronto
  • Xiao-Ping Zhang, Ryerson University
  • Tao Mei, Deputy Managing Director of JD AI Research
  • Arash Mohammadi, Concordia University


Submit Final version of an accepted Challenge Submission
Multi-Speaker Multi-Style Voice Cloning Challenge (M2VoC)

Text-to-speech (TTS) or speech synthesis has witnessed significant performance improvement with the help of deep learning. The latest advances in end-to-end text-to-speech paradigm and neural vocoder have enabled us to produce very realistic and natural-sounding synthetic speech reaching almost human-parity performance. But this amazing ability is still limited to the ideal scenarios with a large single-speaker less-expressive training set. The speech quality, target similarity, expressiveness and robustness are still not satisfied for synthetic speech with different speakers and various styles, especially in real-world low-resourced conditions, e.g., each speaker only has a few samples at hand. The current open solutions are also not robust enough to unseen speakers. We call this challenging task as multi-speaker multi-style voice cloning (M2VoC).

Recent advances in transfer learning, style transfer, speaker embedding and factor disentanglement have shed light on the potential solutions to low-resource voice cloning.

As a ICASSP2021 Signal Processing Grand Challenge, the M2VoC challenge aims to provide a common sizable dataset as well as a fair testbed for benchmarking the voice cloning task. We highly encourage the researchers from both academia and industry to join the challenge and have deep discussions as well as collaborations.

Further details: http://challenge.ai.iqiyi.com/detail?raceId=5fb2688224954e0b48431fe0

ZYELL - NCTUNetwork Anomaly Detection Challenge

In today’s digital age, network security is critical as billions of computers around the world are connected with each other over networks. Symantec’s Internet Security Threat Report indicates a 56% increase in the number of network attacks in 2019. Network anomaly detection (NAD) is an attempt to detect anomalous network traffic by observing traffic data over time to define what is “normal” traffic and pick out potentially anomalous behavior that differs in some way.

Signature-based or rule-based NAD is conventionally employed to identify anomalous behaviors, which can generally be divided into two categories based on the detection principle: (1) Flow-based method is to analyze a network connection session that may include the connection protocol, connection time, the total number of packets sent, and so forth; (2) Packet-based method is to analyze the content of each packet. However, signatures and rules are essentially insufficient for network threat detection because they can deal only with known attacks and what distinguishes anomalous behaviors from normal traffic are often subtle.

In recent years, deep learning methods have received much attention, since deep neural networks are able to learn complex patterns of anomalies directly from the network traffic data. However, network traffic data are real-world data compounded by properties such as large scale, noisy label, and class imbalance, making it a challenge for deep learning algorithms. For example, anomalies rarely occur and the majority is normal data (i.e. anomalies only typically occur 0.001-1% of the time), and learning from imbalanced data is still an open challenge.

Therefore, ZYELL-NCTU Network Anomaly Detection Challenge is a joint activity with the research teams from the ZYELL group and National Chiao Tung University. In this challenge, we release a million-scale dataset of real-world network traffic data for network anomaly detection and aim at leveraging solutions across industrial and academic communities to help advance the field of network security.

Further details: https://nad2021.nctu.edu.tw/index.html

COVID-19 Diagnosis

Novel Coronavirus (COVID-19) has drastically overwhelmed more than 200 countries around the world affecting millions and claiming more than 1.5 million human lives, since its first emergence in late 2019. This highly contagious disease can easily spread, and if not controlled in a timely fashion, can rapidly incapacitate healthcare systems.

The main objective of the 2021 IEEE SPGC-COVID is development of fully automated frameworks to identify/classify COVID-19 infections using only volumetric chest CT scans. The introduced SPGC-COVID dataset is a large dataset of COVID-19, CAP, and normal cases acquired with various imaging settings from different medical centers. The challenge is to design advanced and robust learning models to classify the given CT scans into three classes of COVID-19, CAP, and normal cases. Developed learning models need to perform accurately and robustly over such heterogeneous set of CT scans, which include images with different slice thickness, radiation dose, and noise level. In addition to acquisition and visual variations of CT scans, the SPGC-COVID dataset consists of CT scans that, beside COVID-19 infections, include manifestations related to hearth problems/operations.

Any team can participate in the competition, should complete their submission by March 1st, 2021. The five best teams are selected and announced by May 25th, 2021. Three finalist teams will be judged at IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2021, which will be held June 6-12, 2021, Toronto, Canada. In addition to algorithmic performances, demonstration and presentation performances will also affect the final ranking.

Further details: http://i-sip.encs.concordia.ca/2021SPGC-COVID19/index.html

Acoustic Echo Cancellation Challenge: Datasets and Testing Framework

The ICASSP 2021 Acoustic Echo Cancellation Challenge is intended to stimulate research in the area of acoustic echo cancellation (AEC), which is an important part of speech enhancement and still a top issue in audio communication and conferencing systems. Many recent AEC studies report reasonable performance on synthetic datasets where the train and test samples come from the same underlying distribution. However, the AEC performance often degrades significantly on real recordings. Also, most of the conventional objective metrics such as echo return loss enhancement (ERLE) and perceptual evaluation of speech quality (PESQ) do not correlate well with subjective speech quality tests in the presence of background noise and reverberation found in realistic environments.

In this challenge, we open source two large datasets to train AEC models under both single talk and double talk scenarios. These datasets consist of recordings from more than 2,500 real audio devices and human speakers in real environments, as well as a synthetic dataset. We open source an online subjective test framework based on ITU-T P.808 for researchers to quickly test their results. The winners of this challenge will be selected based on the average P.808 Mean Opinion Score (MOS) achieved across all different single talk and double talk scenarios.

Submission instructions

Please use Microsoft Conference Management Toolkit for submitting the results. After logging in, complete the following steps to submit the results:

  1. Choose “Create new submission” in the Author Console.
  2. Enter title, abstract and co-authors, and upload a lastname.txt file (can be empty or contain additional information regarding the submission).
  3. Compress the enhanced results files to a single lastname.zip file, retaining the same folder and file names as the blind test set (max file size: 350 MB).
  4. After creating the submission, return to the “Author Console” (by clicking on “Submissions” at the top of the page) and upload the lastname.zip file via “Upload Supplementary Material”.

Submission deadline: Oct 9, 2020, 11:59pm (anywhere on Earth)

For questions, please contact aec_challenge@microsoft.com

Further details: https://www.microsoft.com/en-us/research/academic-program/acoustic-echo-cancellation-challenge-icassp-2021/

Deep Noise Suppression Challenge

The ICASSP 2021 Deep Noise Suppression (DNS) challenge is designed to foster innovation in the field of noise suppression to achieve superior perceptual speech quality. We recently organized a DNS challenge special session at INTERSPEECH 2020. We open sourced training and test datasets for researchers to train their noise suppression models. We also open sourced a subjective evaluation framework and used the tool to evaluate and pick the final winners. Many researchers from academia and industry made significant contributions to push the field forward. The results of the INTERSPEECH DNS Challenge show we still have a long way to go in achieving superior speech quality in challenging noisy conditions. In this challenge, we will be adding over 20 hours of clean speech with singing and provide more information about the characteristics of the noise based on stationarity. We will also provide over 100000 synthetic and real room impulse responses (RIRs) curated from other data sets.

We will have two tracks for this challenge:

  • Real – Time Denoising track:
    The noise suppressor must take less than the stride time Ts (in ms) to process a frame of size T (in ms) on an Intel Core i5 quad-core machine clocked at 2.4 GHz or equivalent processors. For example, Ts = T/2 for 50% overlap between frames. The total algorithmic latency allowed including the frame size T, stride time Ts and any look ahead must be less than or equal to 40ms. For example, if you use a frame length of 20ms with a stride of 10ms resulting in an algorithmic delay of 30ms, then you satisfy the latency requirements. If you use a frame size of 32ms with a stride of 16ms resulting in an algorithmic delay of 48ms, then your method does not satisfy the latency requirements as the total algorithmic latency exceeds 40ms. If your frame size plus stride T1=T+Ts is less than 40ms, then you can use up to (40-T1)ms future information.
  • Personalized Deep Noise Suppression (pDNS) track:
    • Satisfy Track 1 requirements
    • You will have access to 2 minutes speech of a particular speaker to extract speaker related information that might be useful to improve the quality of the noise suppressor. The enhancement must be done on the noisy speech test segment of the same speaker.
    • The enhanced speech using speaker information must be of better quality than enhanced speech without using the speaker information.

Participants are forbidden from using the blind test set to retrain or tweak their models.They must not submit clips enhanced using any speech enhancement method that is not being submitted to ICASSP 2021 by the authors.Failing to adhere to these rules will lead to disqualification from the challenge.


Please send an email to dns_challenge@microsoft.com stating that you are interested to participate in the challenge. Please include the following details in your email:

  • Names of the participants and name of the team captain
  • Institution/Company
  • Email

Top three winning teams from each track will be awarded prizes as outlined in the description of the rules.

Please email us, if you have any questions or need clarification about any aspect of the challenge

Further details: https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-icassp-2021/