- Proximal Dogleg Opportunistic Majorization for Nonconvex and Nonsmooth Optimization(arXiv)
Creator : Yiming Zhou, Wei Dai
Abstract : We take into consideration minimizing a function consisting of a quadratic time interval and a proximable time interval which is presumably nonconvex and nonsmooth. This disadvantage can also be known as scaled proximal operator. No matter its simple type, current methods endure from sluggish convergence or extreme implementation complexity or every. To beat these limitations, we develop a fast and user-friendly second-order proximal algorithm. Key innovation contains establishing and fixing a set of opportunistically majorized points alongside a hybrid Newton course. The technique instantly makes use of the precise Hessian of the quadratic time interval, and calculates the inverse solely as quickly as, eliminating the iterative numerical approximation of the Hessian, a typical apply in quasi-Newton methods. The algorithm’s convergence to a significant degree is established, and native convergence cost is derived based on the Kurdyka-Lojasiewicz property of the goal function. Numerical comparisons are carried out on well-known optimization points. The outcomes present that the proposed algorithm not solely achieves a faster convergence however as well as tends to converge to a higher native optimum consider to benchmark algorithms.
2.Zeroth-order Gradient and Quasi-Newton Methods for Nonsmooth Nonconvex Stochastic Optimization (arXiv)
Creator : Luke Marrinan, Uday V. Shanbhag, Farzad Yousefian
Abstract : We take into consideration the minimization of a Lipschitz regular and expectation-valued function outlined as f(x)≜E[f~(x,ξ)], over a closed and convex set. Our focus lies on buying every asymptotics along with cost and complexity ensures for computing an approximate stationary degree (in a Clarke sense) by way of zeroth-order schemes. We undertake a smoothing-based technique reliant on minimizing fη the place fη(x)=Eu[f(x+ηu)], u is a random variable outlined on a unit sphere, and η>0. It has been seen {{that a}} stationary degree of the η-smoothed disadvantage is a 2η-stationary degree for the distinctive disadvantage throughout the Clarke sense. In such a setting, we develop two models of schemes with promising empirical conduct. (I) We develop a smoothing-enabled variance-reduced zeroth-order gradient framework (VRG-ZO) and make two models of contributions for the sequence generated by the proposed zeroth-order gradient scheme. (a) The residual function of the smoothed disadvantage tends to zero nearly actually alongside the generated sequence, allowing for making ensures for η-Clarke stationary choices of the distinctive disadvantage; (b) To compute an x that ensures that the anticipated norm of the residual of the η-smoothed disadvantage is inside ε requires no increased than O(η−1ε−2) projection steps and O(η−2ε−4) function evaluations. (II) Our second scheme is a zeroth-order stochastic quasi-Newton scheme (VRSQN-ZO) reliant on a mixture of randomized and Moreau smoothing; the corresponding iteration and sample complexities for this scheme are O(η−5ε−2) and O(η−7ε−4), respectively