Abstract: Recent investigations of infinitely-wide deep neural networks have given rise to foundational connections between deep nets, kernels, and Gaussian processes. Nonetheless, there is still a gap to characterizing the dynamics of finite-width neural networks in common optimization settings. I’ll discuss how the choice of learning rate is a crucial factor to be considered and naturally classifies gradient descent dynamics of deep nets into two classes (a ‘lazy’ regime and a ‘catapult’ regime) which are separated by a sharp transition as networks become wider. I’ll discuss the distinct phenomenological signatures of the two phases and how they are elucidated in a class of solvable simple models we analyze.