Place: Online Seminar: Please sign up for our mailing list at www.physicsmeetsml.org for zoom link
Speaker: Gadi Naveh, Racah Institute of Physics, Hebrew University
Abstract: Recently, the infinite-width limit of deep neural networks (DNNs) has garnered much attention, since it provides a clear theoretical window into deep learning via mappings to Gaussian processes (GPs). In spite of its theoretical appeal, this perspective lacks a key component of finite DNNs, that is at the core of their success - feature learning. Here we consider DNNs trained with noisy gradient descent on a large training set and derive a self-consistent Gaussian Process theory accounting for strong finite-DNN and feature learning effects. We apply this theory to two toy models and find excellent agreement with experiments. We further identify, both analytically and numerically, a sharp transition between a feature learning regime and a lazy learning regime in one of these models. We have numerical evidence demonstrating that the assumptions required for our theory hold true in more realistic settings (Myrtle5 CNN trained on CIFAR-10).
Link to paper: