Zeyuan Allen-Zhu, Microsoft Research
Tuesday, February 13, 2018 - 4:00pm
LOW 105
The diverse world of deep learning research has given rise to thousands of architectures for neural networks. However, to this date, the underlying training algorithms for neural networks are still stochastic gradient descent (SGD) and its heuristic variants.
In this talk, we present a new stochastic algorithm to train any smooth neural network to ε-approximate local minima, using O(ε^{-3.25}) backpropagations. The best provable result was O(ε^{-4}) by SGD before this work.
More broadly, it finds ε-approximate local minima of any smooth nonconvex function using only O(ε^{-3.25}) stochastic gradient computations.