2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

IEEE Signal Processing Society

Institute of Electrical and Electronics Engineers (IEEE)

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper ID	SS-15.6
Paper Title	TEACHER-STUDENT LEARNING WITH MULTI-GRANULARITY CONSTRAINT TOWARDS COMPACT FACIAL FEATURE REPRESENTATION
Authors	Shurun Wang, Shiqi Wang, Wenhan Yang, City University of Hong Kong, China; Xinfeng Zhang, University of Chinese Academy of Sciences, China; Shanshe Wang, Siwei Ma, Peking University, China
Session	SS-15: Signal Processing for Collaborative Intelligence
Location	Gather.Town
Session Time:	Friday, 11 June, 13:00 - 13:45
Presentation Time:	Friday, 11 June, 13:00 - 13:45
Presentation	Poster
Topic	Special Sessions: Signal Processing for Collaborative Intelligence
IEEE Xplore Open Preview	Click here to view in IEEE Xplore
Virtual Presentation	Click here to watch in the Virtual Conference
Abstract	In this paper, we propose a novel end-to-end feature compression scheme by leveraging the representation and learning capability of deep neural networks, towards intelligent front-end equipped analysis with promising accuracy and efficiency. In particular, the extracted features are compactly coded in an end-to-end manner by optimizing the rate-distortion cost to achieve feature-in-feature representation. The multi-granularity constraint is further imposed, serving as the optimization objective to make the feature compression more ``healthier'' from the perspective of ultimate utility. More specifically, the analysis accuracy is considered in the coarse granularity level constraint, ensuring the capability of facial analysis with the reconstructed feature. Furthermore, at the fine granularity level the feature fidelity is involved to preserve the original feature quality. Moreover, a latent code level teacher-student enhancement model is proposed to efficiently transfer the low bit-rate representation into a high bit-rate one. Such a strategy further allows us to adaptively shift the representation cost to decoding computations, leading to more flexible feature compression with enhanced decoding capability. We verify the effectiveness of the proposed model with the facial feature, and experimental results reveal better compression performance in terms of rate-accuracy compared with existing models.