Paper ID | MLSP-9.2 | ||
Paper Title | A LARGE-DIMENSIONAL ANALYSIS OF SYMMETRIC SNE | ||
Authors | Charles Séjourné, Romain Couillet, Pierre Comon, GIPSA-Lab, University Grenoble Alpes, France | ||
Session | MLSP-9: Learning Theory for Neural Networks | ||
Location | Gather.Town | ||
Session Time: | Tuesday, 08 June, 16:30 - 17:15 | ||
Presentation Time: | Tuesday, 08 June, 16:30 - 17:15 | ||
Presentation | Poster | ||
Topic | Machine Learning for Signal Processing: [MLR-LEAR] Learning theory and algorithms | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | Stochastic Neighbour Embedding methods (SNE, t-SNE) aim at finding a faithful low-dimensional representation of a high-dimensional dataset. Despite their popularity, being solution to a non-convex optimization, the behavior of these tools is not well understood. This work provides first answers by leveraging a large dimensional statistics approach, where the number n and dimension p of the large-dimensional data are of the same magnitude. We derive and study the canonical equation verified by the critical points of this non-convex optimization problem. The study notably reveals that, in a simple setup, the achievable SNE solutions correspond to a subset of those critical points. In particular, when the clusters composing the dataset are balanced in size, these solutions are symmetrical and assume closed-form expressions. As a major conclusion, the analysis rigorously proves along-standing heuristic statement on the “proper normalization” of the symmetric SNE: out of two natural normalization choices, only the claimed proper one leads to non-trivial solutions. |