Date and Time:
Momentum Acceleration Under Random Gradient Noise: From Convex to Non-Convex Optimization
For many large-scale optimization and machine learning problems, first-order methods and their accelerated variants based on momentum have been a leading approach for computing low-to-medium accuracy solutions because of their cheap iterations and mild dependence on the problem dimension and data size. Even though momentum-based accelerated gradient (AG) methods proposed by Nesterov for convex optimization converges provably faster than gradient descent (GD) in the absence of noise, the comparison is no longer clear in the presence of gradient noise. In the first part of the talk, we focus on GD and AG methods for minimizing strongly convex functions when the gradient has random errors in the form of additive i.i.d. noise. We study the trade-offs between convergence rate and robustness to gradient errors in designing a first-order algorithm. Our results show that AG can achieve acceleration while being more robust to random gradient errors. Our framework also leads to practical algorithms that can perform better than other state-of-the-art methods in the presence of random gradient noise. In the second part of the talk, we focus on non-convex optimization and study the stochastic gradient Hamiltonian Monte Carlo (SGHMC) algorithm, which is a popular variant of the stochastic gradient with momentum where a controlled and properly scaled Gaussian noise is added to the unbiased gradient estimates to steer them towards a global minimum. We obtain first-time non-asymptotic global convergence guarantees for SGHMC for both empirical risk and population risk optimization problems. Our results demonstrate that momentum-based Monte Carlo algorithms can lead to better iteration complexity as well as generalization performance compared to known guarantees for the (reversible) standard stochastic gradient Langevin Monte Carlo methods, justifying the use of momentum in the context of non-convex optimization and deep learning further.
Mert Gürbüzbalaban is an assistant professor at Rutgers University. Previously, he was a postdoctoral associate at the Laboratory for Information and Decision Systems (LIDS) at MIT. He is broadly interested in optimization and computational science driven by applications in large-scale information and decision systems. He received his B.Sc. degrees in Electrical Engineering and Mathematics as a valedictorian from Boğaziçi University, Istanbul, Turkey, the “Diplôme d’ingénieur” degree from École Polytechnique, France, and the M.S. and Ph.D. degrees in (Applied) Mathematics from the Courant Institute of Mathematical Sciences, New York University. Dr. Gürbüzbalaban received the Kurt Friedrichs Prize (given by the Courant Institute of New York University for an outstanding thesis) in 2013, Bronze Medal in the École Polytechnique Scientific Project Competition in 2006, the Nadir Orhan Bengisu Award (given by the electrical-electronics engineering department of Boğaziçi University to the best graduating undergraduate student) in 2005 and the Bülent Kerim Altay Award from the Electrical-Electronics Engineering Department of Middle East Technical University in 2001. He received funding from a variety of sources including multiple programs at the U.S. National Science Foundation.