Paper ID | MLSP-24.2 |
Paper Title |
Efficient Adversarial Audio Synthesis via Progressive Upsampling |
Authors |
Youngwoo Cho, Korea Advanced Institute of Science and Technology (KAIST), South Korea; Minwook Chang, NCSOFT, South Korea; Sanghyeon Lee, Korea Advanced Institute of Science and Technology (KAIST), South Korea; Hyoungwoo Lee, Gerard Jounghyun Kim, Korea University, South Korea; Jaegul Choo, Korea Advanced Institute of Science and Technology (KAIST), South Korea |
Session | MLSP-24: Applications in Audio and Speech Processing |
Location | Gather.Town |
Session Time: | Wednesday, 09 June, 16:30 - 17:15 |
Presentation Time: | Wednesday, 09 June, 16:30 - 17:15 |
Presentation |
Poster
|
Topic |
Machine Learning for Signal Processing: [MLR-APPL] Applications of machine learning |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
This paper proposes a novel generative model called \toolname, which progressively synthesizes high-quality audio in raw-waveform. Progressive upsampling GAN (PUGAN) leverages the previous idea of the progressive generation of higher-resolution output by stacking multiple encoder-decoder architectures. Compared to the existing state-of-the-art model called WaveGAN, which uses a single decoder architecture, our model generates audio signals and converts them to a higher resolution in a progressive manner, while using a significantly smaller number of parameters, e.g., 3.17x smaller for 16 kHz output, than the WaveGAN. Our experiments show that the audio signals can be generated in real-time with comparable quality to that of WaveGAN with respect to the inception scores and human perception. |