About this Event
139 The Green
Title: Does the Data Induce Capacity Control in Deep Learning?
Abstract:
Deep networks are mysterious. These overparametrized machine learning models, trained with rudimentary optimization algorithms on non-convex landscapes in millions of dimensions have defied attempts to put a sound theoretical footing beneath their impressive performance.
This talk aims to shed light upon some of these mysteries. The first part of this talk will employ ideas from thermodynamics and optimal transport to paint a picture of the training process of deep networks and unravel a number of peculiar properties of algorithms like stochastic gradient. The second part of the talk will argue that these peculiarities observed during training, as also the anomalous generalization, may be coming from that data that we train upon. This part will discuss how typical datasets are "sloppy", i.e., the input correlation matrix has a strong structure and consists of a large number of small eigenvalues that are distributed uniformly over an exponentially large range. This structure is mirrored in a trained deep network in that a number of quantities such as the Hessian, the Fisher Information Matrix, as well as others activation correlations and Jacobians, are also sloppy. This talk will develop these concepts to demonstrate the first analytical non-vacuous generalization bound for deep networks.
This talk will discuss results from the following two papers.
1. Does the data induce capacity control in deep learning?. Yang Rubing, Mao Jialin, Chaudhari Pratik. [arXiv preprint, 2021] https://arxiv.org/abs/2110.14163.
2. Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks. Pratik Chaudhari and Stefano Soatto [ICLR ’18] https://arxiv.org/abs/1710.11029
-------------
Bio:
Pratik Chaudhari is an Assistant Professor in Electrical and Systems Engineering and Computer and Information Science at the University of Pennsylvania. He is a member of the GRASP Laboratory. From 2018-19, he was a Senior Applied Scientist at Amazon Web Services and a post-doctoral scholar in Computing and Mathematical Sciences at CalTech. Pratik received his PhD (2018) in Computer Science from UCLA, his Master's (2012) and Engineer's (2014) degrees in Aeronautics and Astronautics from MIT and his Bachelor’s degree (2010) from IIT Bombay. He was a part of nuTonomy Inc. (now Hyundai-Aptiv Motional) from 2014-16.
0 people are interested in this event
User Activity
No recent activity