A central challenge to building a theoretical foundation for deep learning is that the motion of a network in high-dimensional parameter space undergoes discrete finite steps along complex stochastic gradients. We circumvent this obstacle through a unifying theoretical framework based on intrinsic symmetries embedded in a network’s architecture and a continuous-time description of the discrete numerical trajectory taken by SGD at finite learning rates. We will discuss our recent work “Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics” and two follow-up directions of research where we establish new formulations of deep learning dynamics based on Lagrangian and statistical Mechanics.
This talk will be hybrid, held in-person and online on Zoom