Workshop

Date : 24/10/2014
Lieu : Orange Labs
Salle S012, 38-40 rue du Général Leclerc
Issy-les-Moulineaux (Métro 12 – Mairie d’Issy)
Plan d’accès / map

Déroulement : (voir le programme détaillé ci-dessous)
10h – 10h30 Accueil
10h30 – 11h Ouverture : N. Demassieux / P. Duvaut
11h – 12h Yurii Nesterov (CORE/INMA, Univesité Catholique de Louvain)
12h – 13h Pause déjeuner
13h – 14h Francis Bach (INRIA/ENS)
14h – 14h30 Pause
14h30 – 15h00 Tanguy Urvoy (Orange Labs)
15h30 – 16h00 Eric Moulines (Telecom ParisTech/LTCI)
16h00 – 16h15 Pause
16h15 – 16h45 Pascal Bianchi (Telecom ParisTech/LTCI)
16h45 – 17h30 Olivier Fercoq (Telecom ParisTech/LTCI)
17h30 – 17h45 Clôture

Programme :
——————————–
Yurii Nesterov (CORE/INMA, Université Catholique de Louvain)
Subgradient methods for huge-scale optimization problems (slides)
We consider a new class of huge-scale optimization problems, the problems with sparse
subgradients. The most important functions of this type are piece-wise linear. For problems
with uniform sparsity of corresponding linear operators, we suggest a very efficient
implementation of subgradient iterations, which total cost depends logarithmically on the
dimension. This technique is based on a recursive update of the results of matrix/vector
products and the values of symmetric functions. It works well, for example, for matrices with
few nonzero diagonals and for max-type functions. We show that the updating technique
can be efficiently coupled with the simplest subgradient methods, and present promising
results of preliminary computational experiments.

——————————–
Francis Bach
Beyond stochastic gradient descent for large-scale machine learning (slides)
Many machine learning and signal processing problems are traditionally cast as convex
optimization problems. A common difficulty in solving these problems is the size of the data,
where there are many observations (« large n ») and each of these is large (« large p »). In this
setting, online algorithms such as stochastic gradientdescent which pass over the data only
once, are usually preferred over batch algorithms, which require multiple passes over the
data. In this talk, I will show how the smoothness of loss functions may be used to design
novel algorithms with improved behavior, both in theory and practice: in the ideal infinite-
data setting, an efficient novel Newton-based stochastic approximation algorithm leads to a
convergence rate of O(1/n) without strong convexity assumptions, while in the practical
finite-data setting, an appropriate combination of batch and online algorithms leads to
unexpected behaviors, such as a linear convergence rate for strongly convex problems, with
an iteration cost similar to stochastic gradientdescent. (joint work with Nicolas Le Roux, Eric
Moulines and Mark Schmidt)

——————————–
Tanguy Urvoy
Generic bandit policies synthesis and voting bandits (slides)
In 2013 we studied a stochastic online learning scheme with partial monitoring where the
utility of decisions is only observable through an estimation of the environment parameters.
We proposed a generic pure-exploration algorithm called SAVAGE, able to cope with various
utility functions from multi-armed bandits settings to dueling bandits. The primary application
of this setting was to offer a natural generalization of dueling bandits for situations where the
environment parameters reflect the idiosyncratic preferences of a mixed crowd.
The notion of Condorcet-bandits that we introduced at this occasion has been recently
studied by other research teams who improved our algorithm and managed to obtain
interesting results.

——————————–
Eric Moulines (Telecom ParisTech)
Stochastic Proximal Algorithm with applications
We study a stochastic version of the proximal gradient algorithm where the gradient is
analytically intractable, and is approximated by Monte Carlo simulation. We derive a non-
asymptotic bound on the convergence rate, and we derive conditions involving the Monte
Carlo batch-size and the step-size of the algorithm under which convergence is guaranteed.
In particular, we show that the stochastic proximal gradient algorithm achieves the same
convergence rate as its deterministic counterpart. We extend the analysis to a stochastic
version of the fast iterative shrinkage-thresholding of Beck and Teboulle, 2009, whereby the
gradient is approximated by Monte Carlo simulation. Contrary to the deterministic setting
where the fast iterative shinkage-thresholding is known to converge faster than the proximal
gradient algorithm, our results suggest that in the stochastic setting, the acceleration
scheme yields no improvement. To illustrate, we apply the algorithms to the estimation of
network structures from multiple realizations of a Gibbs measure.

——————————–
Pascal Bianchi (Telecom ParisTech)
A primal-dual stochastic coordinate descent algorithm with application to distributed optimization (slides)
We investigate the almost sure convergence of stochastic coordinate descent
method and discuss applications to stochastic optimization problems and to distributed
optimization. In particular, we will introduce a primal-dual algorithm deeply inspired from
recent works by Vu and Condat. We provide a randomized version of the algorithm based on
coordinate descent. In the case of distributed optimization, we consider a set of N agents
having private composite objective functions and seeking to find a consensus on the
minimum of the aggregate objective. In that case, our method yields a distributed iterative
algorithm where each agent use both local computations and message passing in an
asynchronous manner.

———————————
Olivier Fercoq (Telecom ParisTech)
Accelerated, Parallel and Proximal Coordinate Descent (slides)
We propose a new stochastic coordinate descent method for minimizing the
sum of convex functions each of which depends on a small number of
coordinates only. Our method (APPROX) is simultaneously Accelerated,
Parallel and PROXimal. Hence, it converges at a rate O(1/k²) when
applied to a weakly convex function. The method can be used to solve the
Lasso and the dual SVM problem at this accelerated rate and on a
parallel machine. We finally give convergence results when using
non-uniform samplings.

 

1 réflexion sur « Workshop »

  1. Ping : Six défis pour le Big Data | Blog de la recherche

Les commentaires sont fermés.