Workshop on Optimization for Learning and Control

June 4-6, 2025 — IMT School for Advanced Studies Lucca
Aula Guinigi
About the Workshop

Overview

This workshop brings together researchers and PhD students to exchange ideas and stimulate collaborations on optimization methods for machine learning and embedded control. The 3-day event features invited talks from leading experts and poster presentations highlighting innovative work by PhD students.

Speakers

Stephen Boyd

Stanford University, USA

Learning Parametrized Convex Functions

TBD

Stephen Boyd

Volkan Cevher

EPFL, Switzerland

Training Deep Learning Models with Norm-Constrained LMOs

In this work, we study optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball. We propose a new stochastic family of algorithms that uses the LMO to adapt to the geometry of the problem and, perhaps surprisingly, show that they can be applied to unconstrained problems. The resulting update rule unifies several existing optimization methods under a single framework. Furthermore, we propose an explicit choice of norm for deep architectures, which, as a side benefit, leads to the transferability of hyperparameters across model sizes. Experimentally, we demonstrate significant speedups on nanoGPT training without any reliance on Adam. The proposed method is memory-efficient, requiring only one set of model weights and one set of gradients, which can be stored in half-precision.

Volkan Cevher

Moritz Diehl

University of Freiburg, Germany

Numerical Optimal Control for Nonsmooth Dynamic Systems

This talk presents recent progress in the field of optimal control of nonsmooth dynamic systems, motivated by applications in robotics. The considered systems are characterized by a special type of ordinary differential equations (differential inclusions) that exhibit not only state dependent switches but even state dependent state jumps due to contacts and friction. Crucially, the nonsmoothness can be encoded in the form of complementarity systems that result from the solution of convex optimization problems that depend on the system state. We discuss several techniques - with either fixed or variable time-steps - to transcribe the continuous time problem into a finite dimensional mathematical program with complementarity constraints (MPCC). Finally, we present tailored numerical methods to efficiently solve the resulting nonlinear MPCC problems that include problem functions based on lower level convex solvers, and application results in simulations and real-world experiments with an assembly robot at Siemens research in Munich.
This talk presents joint work with Armin Nurkanovic, Anton Pozharskiy, Christian Dietz and Sebastian Albrecht.

Download slides (PDF)

Moritz Diehl

Mikael Johansson

KTH Royal Institute of Technology, Sweden

From core to clusters: tailoring optimization algorithms to modern compute architectures.

Optimization algorithms power applications from embedded control systems to large-scale machine learning. However, the rapid diversification of hardware requires algorithm designs that move beyond abstract complexity measures to an explicit hardware-awareness. This talk presents our efforts toward hardware-aware optimization methods tailored specifically to the architectural constraints and opportunities inherent in embedded systems, GPU accelerators, and decentralized compute clusters. For compute accelerators, we develop a GPU-optimized Douglas-Rachford splitting algorithm for optimal transport problems. By carefully reorganizing the computational workflow to align with GPU warp execution patterns and optimize memory coalescence, our method attains order-of-magnitude speedups compared to traditional implementations, all while maintaining rigorous convergence guarantees. For distributed environments, we address the fundamental challenge of making decentralized training outperform AllReduce-based methods in practical high-performance computing settings. Our approach introduces a novel communication-computation interleaving strategy combined with a specialized Adam optimizer that mitigates the effects of small local batch sizes. By carefully analyzing the interplay between topology design, communication patterns, and optimizer behavior, we demonstrate both reduced wall-clock training times and improved generalization performance on DNN training. Across these computing paradigms, we show how optimization algorithms can preserve theoretical rigor while exploiting hardware-specific characteristics to achieve significant performance gains.

Download slides (PDF)

Mikael Johansson

Ion Necoara

University Politehnica Bucharest, Romania

Linearized augmented Lagrangian methods for optimization problems with nonlinear equality constraints

We consider (nonsmooth) nonconvex optimization problems with nonlinear equality constraints. We assume that (some part of) the objective function and the functional constraints are locally smooth. For solving this problem, we propose linearized augmented Lagrangian methods, i.e., we linearize the objective function and the functional constraints in a Gauss-Newton fashion at the current iterate within the augmented Lagrangian function and add a quadratic regularization, yielding a (convex) subproblem that is easy to solve, and whose solution is the next primal iterate. The update of the dual multipliers is also based on the linearization of functional constraints. Under some specific dynamic regularization parameter choices, we prove boundedness and global asymptotic convergence of the iterates to a first-order solution of the problem. We also derive convergence guarantees for the iterates to an ϵ-first-order solution.

Ion Necoara

Yurii Nesterov

Center for Operations Research at Corvinus University of Budapest, Hungary, and
School of Data Science at the Chinese University of Hong Kong, Hong Kong SAR

Asymmetric Long-Step Primal-Dual Interior-Point Methods with Dual Centering

We discuss a new way of development of asymmetric Interior-Point Methods for solving primal-dual problems of Conic Optimization. It is very efficient for problems, where the dual formulation is simpler than the primal one. The problems of this type arise often in Semidefinite Optimization (SDO), for which we propose a new primal-dual method with very attractive computational cost. In this approach, we do not need sophisticated Linear Algebra, restricting ourselves by standard Cholesky factorization. However, our complexity bounds correspond to the best-known polynomial-time results. Moreover, for symmetric cones the bounds automatically depend on the minimal barrier parameter between the primal and the dual feasible sets. We show by SDO-examples that the corresponding gain can be very big. We discuss some classes of SDO-problems, where the complexity bounds are proportional to the square root of the number of linear equality constraints and the computational cost of one iteration is as in Linear Optimization. Our theoretical developments are supported by encouraging numerical testing.

Yurii Nesterov

Giuseppe Notarstefano

University of Bologna, Italy

System Theory Tools for Optimization in Learning and Control

Optimization is a fundamental tool to solve several control and learning tasks, but optimization algorithms are often considered as building blocks with their convergence guarantees. In this talk I will present a different perspective in which optimization algorithms are considered as discrete-time dynamical systems that can be combined or integrated as suitable subsystems into a closed-loop scheme. This opens up the possibility to use system theory tools to analyze their (convergence) properties and also gain key insights for the design of novel and possibly advanced schemes. This is particularly interesting for complex scenarios involving large-scale and distributed systems or online schemes involving simultaneously optimization, learning and control. In particular, timescale separation will be proposed as a tool to frame accelerated and model-free optimization algorithms as well as to design schemes for distributed computing. Finally, scenarios combining optimization with learning and control schemes will be shown.

Giuseppe Notarstefano

Panos Patrinos

KU Leuven, Belgium

Nonlinearly Preconditioned Gradient Methods under Generalized Smoothness

We analyze nonlinearly preconditioned gradient methods for solving smooth minimiza- tion problems. We introduce a generalized smoothness property, based on the notion of abstract convexity, that is broader than Lipschitz smoothness and provide sufficient first- and second-order conditions. Notably, our framework encapsulates algorithms associated with the clipping gradient method and brings out novel insights for the class of (L0,L1)- smooth functions that has received widespread interest recently, thus allowing us to go beyond already established methods. We investigate the convergence of the proposed method in both the convex and nonconvex setting.

Panos Patrinos

Program

9:00 – 9:30Registration
9:30 – 10:30Invited Talk 1 – Volkan Cevher
10:30 – 11:30Coffee Break
11:30 – 12:30Invited Talk 2 – Mikael Johansson
12:30 – 15:00Lunch Break
15:00 – 16:00Invited Talk 3 – Ion Necoara
16:00 – 18:00Coffee Break and Poster Presentations
9:30 – 10:30Invited Talk 4 – Stephen Boyd
10:30 – 11:30Coffee Break
11:30 – 12:30Invited Talk 5 – Yurii Nesterov
12:30 – 15:00Lunch Break
15:00 – 16:00Invited Talk 6 – Panos Patrinos
16:00 – 18:00Coffee Break and Poster Presentations
9:30 – 10:30Invited Talk 7 – Giuseppe Notarstefano
10:30 – 11:30Coffee Break
11:30 – 12:30Invited Talk 8 – Moritz Diehl
12:30 – 15:00Lunch

Poster Presentations

Samuel Erickson Andersson Personalized Federated Learning under Model Dissimilarity Constraints
Adeyemi D. Adeoye SCORE: Approximating Curvature Information under Self-Concordant Regularization
Riccardo Brumali Data-Driven Distributed Optimization via Aggregative Tracking and Deep-Learning
Amir Daghestani Byzantine-Robust Federated Learning with Learnable Aggregation Weights
Brecht Evens Progressive Decoupling of Linkage Problems Beyond Elicitable Monotonicity
Matteo Facchino Tracking MPC Tuning in Continuous Time: A First-Order Approximation of Economic MPC
Nick Korbit Scalable Gauss–Newton Methods for Training Deep Neural Networks
Pablo Krupa Restart of Accelerated First-Order Methods with Linear Convergence Under a Quadratic Functional Growth Condition
Sampath Kumar Mulagaleti System identification with controller-synthesis guarantees
Pieter Pas Parallelization and Vectorization of a Structure-Exploiting Solver for Optimal Control
Alice Rosetti Multi-robot safe cooperation via combined predictive filters and constraint compression
Alberto Zaupa Accelerating ADMM for embedded MPC on modern hardware
Kui Xie Online Design of Experiments by Active Learning for System Identification of Autoregressive Models
Zesen Wang From Promise to Practice: Realizing High-performance Decentralized Training

Venue

Aula Guinigi, IMT School for Advanced Studies Lucca
Piazza San Francesco, 19, Lucca, 55100 Lucca LU, Italy

Travel

From Pisa Airport (PSA) to Lucca

The easiest way to reach Lucca from Pisa Airport is by train.

  1. Take the PisaMover: From the airport terminal, take the PisaMover shuttle train to Pisa Centrale Station. The journey takes about 5 minutes, and departures are frequent.
  2. Train to Lucca: From Pisa Centrale, take a regional train to Lucca. The journey takes approximately 25-30 minutes, and trains run regularly.

You can purchase train tickets at Pisa Centrale or online via the Trenitalia website.

From Florence Airport (FLR) to Lucca

To travel from Florence Airport to Lucca:

  1. Tram to Florence Station: Take the T2 tram line from Florence Airport to Firenze Santa Maria Novella (SMN) train station. The tram ride is about 20 minutes.
  2. Train to Lucca: From Firenze SMN, take a regional train to Lucca. The journey typically takes 1 hour 15 minutes to 1 hour 45 minutes, depending on the specific train.

Tickets can be bought at the airport, at Florence SMN, or online.

Getting to IMT Lucca from Lucca Train Station

IMT Lucca is conveniently located within a short, walkable distance from Lucca Train Station. The walk is approximately 15-20 minutes and takes you through the charming streets of Lucca.

Upon exiting the train station, follow the signs towards the city center (centro storico). Continue straight along Via del Pallone, then turn left onto Via Santa Croce. Piazza San Francesco will be on your right.

Organizing Committee

  • Alberto Bemporad (IMT School for Advanced Studies Lucca, Italy)
  • Mario Zanon (IMT School for Advanced Studies Lucca, Italy)
  • Moritz Diehl (University of Freiburg, Germany)
  • Panagiotis Patrinos (KU Leuven, Belgium)
  • Puya Latafat (IMT School for Advanced Studies Lucca, Italy)

Contact

For inquiries, please email: eventi@imtlucca.it