2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDSS-15.6
Paper Title TEACHER-STUDENT LEARNING WITH MULTI-GRANULARITY CONSTRAINT TOWARDS COMPACT FACIAL FEATURE REPRESENTATION
Authors Shurun Wang, Shiqi Wang, Wenhan Yang, City University of Hong Kong, China; Xinfeng Zhang, University of Chinese Academy of Sciences, China; Shanshe Wang, Siwei Ma, Peking University, China
SessionSS-15: Signal Processing for Collaborative Intelligence
LocationGather.Town
Session Time:Friday, 11 June, 13:00 - 13:45
Presentation Time:Friday, 11 June, 13:00 - 13:45
Presentation Poster
Topic Special Sessions: Signal Processing for Collaborative Intelligence
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract In this paper, we propose a novel end-to-end feature compression scheme by leveraging the representation and learning capability of deep neural networks, towards intelligent front-end equipped analysis with promising accuracy and efficiency. In particular, the extracted features are compactly coded in an end-to-end manner by optimizing the rate-distortion cost to achieve feature-in-feature representation. The multi-granularity constraint is further imposed, serving as the optimization objective to make the feature compression more ``healthier'' from the perspective of ultimate utility. More specifically, the analysis accuracy is considered in the coarse granularity level constraint, ensuring the capability of facial analysis with the reconstructed feature. Furthermore, at the fine granularity level the feature fidelity is involved to preserve the original feature quality. Moreover, a latent code level teacher-student enhancement model is proposed to efficiently transfer the low bit-rate representation into a high bit-rate one. Such a strategy further allows us to adaptively shift the representation cost to decoding computations, leading to more flexible feature compression with enhanced decoding capability. We verify the effectiveness of the proposed model with the facial feature, and experimental results reveal better compression performance in terms of rate-accuracy compared with existing models.