Asynchronous SGD Beats Minibatch SGD Underneath Arbitrary Delays
Authors: Konstantin Mishchenko, Francis Bach, Mathieu Even, Blake Woodworth
Summary: The present evaluation of asynchronous stochastic gradient descent (SGD) degrades dramatically when any delay is giant, giving the impression that efficiency relies upon totally on the delay. Quite the opposite, we show significantly better ensures for a similar asynchronous SGD algorithm whatever the delays within the gradients, relying as an alternative simply on the variety of parallel units used to implement the algorithm. Our ensures are strictly higher than the prevailing analyses, and we additionally argue that asynchronous SGD outperforms synchronous minibatch SGD within the settings we think about. For our evaluation, we introduce a novel recursion primarily based on “digital iterates” and delay-adaptive stepsizes, which permit us to derive state-of-the-art ensures for each convex and non-convex targets