Date : 24/10/2014

Lieu : Orange Labs

Salle S012, 38-40 rue du Général Leclerc

Issy-les-Moulineaux (Métro 12 – Mairie d’Issy)

Plan d’accès / map

**Déroulement** : (voir le programme détaillé ci-dessous)

10h – 10h30 Accueil

10h30 – 11h Ouverture : N. Demassieux / P. Duvaut

11h – 12h Yurii Nesterov (CORE/INMA, Univesité Catholique de Louvain)

12h – 13h Pause déjeuner

13h – 14h Francis Bach (INRIA/ENS)

14h – 14h30 Pause

14h30 – 15h00 Tanguy Urvoy (Orange Labs)

15h30 – 16h00 Eric Moulines (Telecom ParisTech/LTCI)

16h00 – 16h15 Pause

16h15 – 16h45 Pascal Bianchi (Telecom ParisTech/LTCI)

16h45 – 17h30 Olivier Fercoq (Telecom ParisTech/LTCI)

17h30 – 17h45 Clôture

**Programme :**

——————————–

** Yurii Nesterov (CORE/INMA, Université Catholique de Louvain)**

** Subgradient methods for huge-scale optimization problems** (slides)

We consider a new class of huge-scale optimization problems, the problems with sparse

subgradients. The most important functions of this type are piece-wise linear. For problems

with uniform sparsity of corresponding linear operators, we suggest a very efficient

implementation of subgradient iterations, which total cost depends logarithmically on the

dimension. This technique is based on a recursive update of the results of matrix/vector

products and the values of symmetric functions. It works well, for example, for matrices with

few nonzero diagonals and for max-type functions. We show that the updating technique

can be efficiently coupled with the simplest subgradient methods, and present promising

results of preliminary computational experiments.

——————————–

**Francis Bach**

** Beyond stochastic gradient descent for large-scale machine learning** (slides)

Many machine learning and signal processing problems are traditionally cast as convex

optimization problems. A common difficulty in solving these problems is the size of the data,

where there are many observations (« large n ») and each of these is large (« large p »). In this

setting, online algorithms such as stochastic gradientdescent which pass over the data only

once, are usually preferred over batch algorithms, which require multiple passes over the

data. In this talk, I will show how the smoothness of loss functions may be used to design

novel algorithms with improved behavior, both in theory and practice: in the ideal infinite-

data setting, an efficient novel Newton-based stochastic approximation algorithm leads to a

convergence rate of O(1/n) without strong convexity assumptions, while in the practical

finite-data setting, an appropriate combination of batch and online algorithms leads to

unexpected behaviors, such as a linear convergence rate for strongly convex problems, with

an iteration cost similar to stochastic gradientdescent. (joint work with Nicolas Le Roux, Eric

Moulines and Mark Schmidt)

——————————–

** Tanguy Urvoy**

** Generic bandit policies synthesis and voting bandits** (slides)

In 2013 we studied a stochastic online learning scheme with partial monitoring where the

utility of decisions is only observable through an estimation of the environment parameters.

We proposed a generic pure-exploration algorithm called SAVAGE, able to cope with various

utility functions from multi-armed bandits settings to dueling bandits. The primary application

of this setting was to offer a natural generalization of dueling bandits for situations where the

environment parameters reflect the idiosyncratic preferences of a mixed crowd.

The notion of Condorcet-bandits that we introduced at this occasion has been recently

studied by other research teams who improved our algorithm and managed to obtain

interesting results.

——————————–

** Eric Moulines (Telecom ParisTech)**

** Stochastic Proximal Algorithm with applications**

We study a stochastic version of the proximal gradient algorithm where the gradient is

analytically intractable, and is approximated by Monte Carlo simulation. We derive a non-

asymptotic bound on the convergence rate, and we derive conditions involving the Monte

Carlo batch-size and the step-size of the algorithm under which convergence is guaranteed.

In particular, we show that the stochastic proximal gradient algorithm achieves the same

convergence rate as its deterministic counterpart. We extend the analysis to a stochastic

version of the fast iterative shrinkage-thresholding of Beck and Teboulle, 2009, whereby the

gradient is approximated by Monte Carlo simulation. Contrary to the deterministic setting

where the fast iterative shinkage-thresholding is known to converge faster than the proximal

gradient algorithm, our results suggest that in the stochastic setting, the acceleration

scheme yields no improvement. To illustrate, we apply the algorithms to the estimation of

network structures from multiple realizations of a Gibbs measure.

——————————–

** Pascal Bianchi (Telecom ParisTech)**

** A primal-dual stochastic coordinate descent algorithm with application to distributed optimization** (slides)

We investigate the almost sure convergence of stochastic coordinate descent

method and discuss applications to stochastic optimization problems and to distributed

optimization. In particular, we will introduce a primal-dual algorithm deeply inspired from

recent works by Vu and Condat. We provide a randomized version of the algorithm based on

coordinate descent. In the case of distributed optimization, we consider a set of N agents

having private composite objective functions and seeking to find a consensus on the

minimum of the aggregate objective. In that case, our method yields a distributed iterative

algorithm where each agent use both local computations and message passing in an

asynchronous manner.

———————————

**Olivier Fercoq (Telecom ParisTech)**

** Accelerated, Parallel and Proximal Coordinate Descent** (slides)

We propose a new stochastic coordinate descent method for minimizing the

sum of convex functions each of which depends on a small number of

coordinates only. Our method (APPROX) is simultaneously Accelerated,

Parallel and PROXimal. Hence, it converges at a rate O(1/k²) when

applied to a weakly convex function. The method can be used to solve the

Lasso and the dual SVM problem at this accelerated rate and on a

parallel machine. We finally give convergence results when using

non-uniform samplings.

Ping : Six défis pour le Big Data | Blog de la recherche