Dmitriy Drusvyatskiy (University of Washington)

ECE 026

The stochastic gradient method is the workhorse algorithm for learning from data. Although originally designed for smooth optimization, the method is now routinely used to optimize functions that are neither smooth nor convex. Yet, the stochastic gradient method often works just as well on such highly irregular problems as on those that are smooth. The goal of this talk is to explain this phenomenon. Roughly speaking, we will see that minimizers of typical problems lie on a distinguished smooth manifold which endows the stochastic gradient method with smooth dynamics up to a small order error. This viewpoint then leads to several consequences such as local rates of convergence, asymptotic normality of averaged iterates, and saddle point avoidance.