IEEE ICASSP 2021 || Toronto, Ontario, Canada || 6-11 June 2021

My ICASSP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.

Create a login based on your email (takes less than one minute)
Perform 'Paper Search'
Select papers that you desire to save in your personalized schedule
Click on 'My Schedule' to see the current list of selected papers
Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper ID

MMSP-3.4

Paper Title

SHOW AND SPEAK: DIRECTLY SYNTHESIZE SPOKEN DESCRIPTION OF IMAGES

Authors

Xinsheng Wang, Xi’an Jiaotong University, China; Siyuan Feng, Delft University of Technology, Netherlands; Jihua Zhu, Xi’an Jiaotong University, China; Mark Hasegawa-Johnson, University of Illinois at Urbana-Champaign, United States; Odette Scharenborg, Delft University of Technology, Netherlands

Session

MMSP-3: Multimedia Synthesis and Enhancement

Location

Gather.Town

Session Time:

Wednesday, 09 June, 14:00 - 14:45

Presentation Time:

Wednesday, 09 June, 14:00 - 14:45

Presentation

Poster

Topic

Multimedia Signal Processing: Emerging Areas in Multimedia

IEEE Xplore Open Preview

Click here to view in IEEE Xplore

Abstract

This paper proposes a new model, referred to as the show and speak (SAS) model that, for the first time, is able to directly synthesize spoken descriptions of images, bypassing the need for any text or phonemes. The basic structure of SAS is an encoder-decoder architecture that takes an image as input and predicts the spectrogram of speech that describes this image. The final speech audio is obtained from the predicted spectrogram via WaveNet. Extensive experiments on the public benchmark database Flickr8k demonstrate that the proposed SAS is able to synthesize natural spoken descriptions for images, indicating that synthesizing spoken descriptions for images while bypassing text and phonemes is feasible.

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

My ICASSP 2021 Schedule

Paper Detail