Wednesday, September 6, 2023 3pm to 3:55pm
About this Event
University of Delaware- Colburn Lab, University of Delaware, 150 Academy St, Newark, DE 19716-3196, USA
A Picture of the Prediction Space of Deep Networks
Deep networks have many more parameters than the number of training data and can therefore overfit---and yet, they predict remarkably accurately in practice. Training such networks is a high-dimensional, large-scale and non-convex optimization problem and should be prohibitively difficult---and yet, it is quite tractable. This talk aims to illuminate these puzzling contradictions. We will argue that deep networks generalize well because of a characteristic structure in the space of learnable tasks. The input correlation matrix for typical tasks has a “sloppy” eigenspectrum where, in addition to a few large eigenvalues, there is a large number of small eigenvalues that are distributed uniformly over a very large range. As a consequence, the Hessian and the Fisher Information Matrix of a trained network also have a sloppy eigenspectrum. Using these ideas, we will demonstrate an analytical non-vacuous PAC Bayes generalization bound for general deep networks. We will next develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we will reveal that the training process explores an effectively low dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We will also show that predictions of networks being trained on different tasks (e.g., different subsets of ImageNet) using different representation learning methods (e.g., supervised, meta-, semi supervised and contrastive learning) also lie on a low-dimensional manifold. References: 1. Does the data induce capacity control in deep learning? Rubing Yang, Jialin Mao, and Pratik Chaudhari. [ICML '22] https://arxiv.org/abs/2110.14163 2. Deep Reference Priors: What is the best way to pretrain a model? Yansong Gao, Rahul Ramesh, and Pratik Chaudhari. [ICML '22] https://arxiv.org/abs/2202.00187 3. A picture of the space of typical learnable tasks. Rahul Ramesh, Jialin Mao, Itay Griniasty, Rubing Yang, Han Kheng Teoh, Mark Transtrum, James P. Sethna, and Pratik Chaudhari [ICML ’23]. https://arxiv.org/abs/2210.17011 4. The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold. Jialin Mao, Itay Griniasty, Han Kheng Teoh, Rahul Ramesh, Rubing Yang, Mark K. Transtrum, James P. Sethna, Pratik Chaudhari. 2023.https://arxiv.org/abs/2305.01604
Pratik Chaudhari is an Assistant Professor in Electrical and Systems Engineering and Computer and Information Science at the University of Pennsylvania. He is a core member of the GRASP Laboratory. From 2018-19, he was a Senior Applied Scientist at Amazon Web Services and a Postdoctoral Scholar in Computing and Mathematical Sciences at Caltech. Pratik received his PhD (2018) in Computer Science from UCLA, and his Master's (2012) and Engineer's (2014) degrees in Aeronautics and Astronautics from MIT. He was a part of NuTonomy Inc. (now Hyundai-Aptiv Motional) from 2014-16. He is the recipient of the Amazon Machine Learning Research Award (2020), NSF CAREER award (2022) and the Intel Rising Star Faculty Award (2022).
0 people are interested in this event
User Activity
No recent activity