- O. Fercoq, A. Gramfort and J. Salmon,
, International Conference on Machine Learning 2015. [pdf]*Mind the duality gap: safer rules for the Lasso*

*Abstract*: Screening rules allow to early discard irrelevant variables from the optimization in Lasso problems , or its derivatives, making solvers faster. In this paper, we propose new versions of the so-called safe rules for the Lasso. Based on duality gap considerations, our new rules create safe test regions whose diameters converge to zero, provided that one relies on a converging solver. This property helps screening out more variables, for a wider range of regularization parameter values. In addition to faster convergence, we prove that we correctly identify the active sets (supports) of the solutions in finite time. While our proposed strategy can cope with any solver, its performance is demonstrated using a coordinate descent algorithm particularly adapted to machine learning use cases. Significant computing time reductions are obtained with respect to previous safe rules.

- G. Papa, P. Bianchi, S. Clémençon,
, Algorithmic Learning Theory 2015.*Adaptive Sampling for Incremental Optimization using Stochastic Gradient Descent*

*Abstract*: A wide collection of popular statistical learning methods, ranging from $K$-means to Support Vector Machines through Neural Networks, can be formulated as a stochastic gradient descent (SGD) algorithm in a specific setup. In practice, the main limitation of this incremental optimization technique is due to the stochastic noise induced by the choice at random of the data involved in the gradient computation at each iteration. It is the purpose of this paper to introduce a novel implementation of the SGD algorithm, where the data subset used at a given step is not picked uniformly at random among all possible subsets but drawn from a specific adaptive sampling scheme, depending on the past iterations in a Markovian manner, in order to refine the current statistical estimation of the gradient. Beyond an algorithmic description of the approach we propose, rate bounds are established and illustrative numerical results are displayed in order to provide theoretical and empirical evidence of its statistical performance, compared to more « naive » SGD implementations. Computational issues are also discussed at length, revealing the practical advantages of the method promoted.

- P. Bianchi,
arXiv preprint [pdf]**Ergodic convergence of a stochastic proximal point algorithm**,

*Abstract*: The purpose of this paper is to establish the almost sure weak ergodic convergence of a sequence of iterates (x(n)) given by x(n+1)=(I+λ(n)A(ξ(n+1),.))−1(x(n)) where (A(s,.):s∈E) is a collection of maximal monotone operators on a separable Hilbert space, (ξ(n)) is an independent identically distributed sequence of random variables on E and (λ(n)) is a positive sequence in ℓ2∖ℓ1. The weighted averaged sequence of iterates is shown to converge weakly to a zero (assumed to exist) of the Aumann expectation E(A(ξ(1),.)) under the assumption that the latter is maximal. We consider applications to stochastic optimization problems of the form min E(f(ξ1,x)) w.r.t. x∈⋂Xi where f is a normal convex integrand and (Xi) is a collection of closed convex sets. In this case, the iterations are closely related to a stochastic proximal algorithm recently proposed by Wang and Bertsekas.

- E. Ndiaye, O. Fercoq, A. Gramfort and J. Salmon,
, arXiv preprint 2015. [pdf]*GAP Safe screening rules for sparse multi-task and multi-class models*

*Abstract*: High dimensional regression benefits from sparsity promoting regularizations. Screening rules leverage the known sparsity of the solution by ignoring some variables in the optimization, hence speeding up solvers. When the procedure is proven not to discard features wrongly the rules are said to be \emph{safe}. In this paper we derive new safe rules for generalized linear models regularized with l1 and l1/l2 norms. The rules are based on duality gap computations and spherical safe regions whose diameters converge to zero. This allows to discard safely more variables, in particular for low regularization parameters. The GAP Safe rule can cope with any iterative solver and we illustrate its performance on coordinate descent for multi-task Lasso, binary and multinomial logistic regression, demonstrating significant speed ups on all tested datasets with respect to previous safe rules.

- O. Fercoq, P. Bianchi,
, arXiv preprint [pdf]*A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non Separable Functions*

*Abstract*: This paper introduces a coordinate descent version of the Vu-Condat algorithm. By coordinate descent, we mean that only a subset of the coordinates of the primal and dual iterates is updated at each iteration, the other coordinates being maintained to their past value. Our method allows us to solve optimization problems with a combination of differentiable functions, constraints as well as non-separable and non-differentiable regularizers. We show that the sequences generated by our algorithm converge to a saddle point of the problem at stake, for a wider range of parameter values than previous methods. In particular, the condition on the step-sizes depends on the coordinate-wise Lipschitz constant of the differentiable function’s gradient, which is a major feature allowing classical coordinate descent to perform so well when it is applicable. We illustrate the performances of the algorithm on a total-variation regularized least squares regression problem and on large scale support vector machine problems.

- P. Bianchi, W. Hachem
, arXiv preprint [pdf]*Dynamical behavior of a stochastic forward-backward algorithm using random monotone operators*

*Abstract*: Maximal monotone operators are set-valued mappings which extend (but are not limited to) the notion of subdifferential of a convex function. The forward-backward splitting algorithm is one of the most studied algorithms for finding iteratively a zero of a sum of two maximal monotone operators A and B. The purpose of this talk is to investigate the case where operators A and B are either unobserved or difficult to handle numerically. We propose an online version of the forward-backward algorithm where at each iteration, A and B are replaced with one realization of some random maximal monotone operators whose expectations (in the Aumann sense) coincide with A and B respectively. Under the assumption of a decreasing step size, we prove that the iterates « shadow » the behavior of a continuous-time differential inclusion and converge almost surely to a zero of A+B provided some conditions. Applications to constrained convex optimization are considered.