On Giant Batch Coaching and Sharp Minima: A Fokker-Planck Perspective
Authors: Xiaowu Dai, Yuhua Zhu
Summary: We research the statistical properties of the dynamic trajectory of stochastic gradient descent (SGD). We approximate the mini-batch SGD and the momentum SGD as stochastic differential equations (SDEs). We exploit the continual formulation of SDE and the speculation of Fokker-Planck equations to develop new outcomes on the escaping phenomenon and the connection with giant batch and sharp minima. Particularly, we discover that the stochastic course of answer tends to converge to flatter minima whatever the batch measurement within the asymptotic regime. Nonetheless, the convergence price is rigorously confirmed to rely on the batch measurement. These outcomes are validated empirically with varied datasets and fashions