Loading…
Wednesday July 23, 2025 4:15pm - 5:30pm PDT
Session: Fixed Point and Min-Max Theory in Data Science
Chair: Ahmet Alacaoglu; Yura Malitsky; Stephen J. Wright
Cluster: Fixed Points and Variational Inequalities

Talk 1: Near-Optimal Sample Complexity for MDPs via Anchoring
Speaker: Roberto Cominetti
Abstract: We study a new model-free algorithm to compute $\varepsilon$-optimal policies for average reward Markov decision processes, in the weakly communicating case. Given a generative model, our procedure combines a recursive sampling technique with Halpern's anchored iteration, and computes an $\varepsilon$-optimal policy with sample and time complexity $\widetilde{O}(|\mathcal{S}||\mathcal{A}|\|h^*\|_{sp}^{2}/\varepsilon^{2})$ both in high probability and in expectation. To our knowledge, this is the best complexity among model-free algorithms, matching the known lower bound up to a factor $\|h^*\|_{sp}$. Although the complexity bound involves the span seminorm $\|h^*\|_{sp}$ of the unknown bias vector, the algorithm requires no prior knowledge and implements a stopping rule which guarantees with probability 1 that the procedure terminates in finite time. We also analyze how these techniques can be adapted for discounted MDPs. Joint work with Mario Bravo and Jongmin Lee.

Talk 2: On the Convergence of Self-Play in Games with Bandit Feedback
Speaker: Haipeng Luo
Abstract: Self-play via online learning algorithms has emerged as a powerful paradigm for solving minimax problems and more broadly training agents in complex multi-agent environments, underpinning some of the most striking successes in AI, such as AlphaGo and superhuman AI for poker. It enables agents to improve iteratively by interacting with copies of themselves, offering a natural path toward equilibria in games. Despite heavy study in recent years, however, the convergence of self-play dynamics is still not fully understood, especially in the case where agents only observe the reward of the played actions (i.e., bandit feedback), a prevalent situation in many applications. In this talk, I will discuss two recent results on this front. In the first part, motivated by the well-known accelerated average-iterate convergence rate when learning with gradient feedback, I will address the question of whether acceleration is also possible under bandit feedback and provide an affirmative answer using a simple algorithm that enjoys instance-dependent convergence rates. In the second part, I will switch focus to the more challenging notion of last-iterate convergence and discuss a new uncoupled self-play dynamic with improved convergence rate, leveraging a simple black-box reduction from last-iterate convergence to average-iterate convergence.

Talk 3: A first-order algorithm for decentralised min-max problems
Speaker: Matthew Tam
Abstract: In this work, we consider a connected network of finitely many agents working cooperatively to solve a min-max problem with convex-concave structure. We propose a decentralised first-order algorithm which can be viewed as combining features of two algorithms: PG-EXTRA for decentralised minimisation problems and the forward reflected backward method for (non-distributed) min-max problems. In each iteration of our algorithm, each agent computes the gradient of the smooth component of its local objective function as well as the proximal operator of its nonsmooth component, following by a round of communication with its neighbours.

Speakers
avatar for Stephen J. Wright

Stephen J. Wright

UW-Madison
Stephen J. Wright is the George B. Dantzig Professor of Computer Sciences, Sheldon Lubar Chair of Computer Sciences, and Hilldale Professor at the University of Wisconsin-Madison. He also serves as Chair of the Computer Sciences Department. His research is in computational optimization... Read More →
avatar for Roberto Cominetti

Roberto Cominetti

Full Professor, Universidad Católica de Chile
Name: Roberto CominettiTitle: Full ProfessorAffiliation: Pontificia Universidad Católica de ChileBio: Roberto Cominetti graduated as Mathematical Engineer from Universidad de Chile in 1986 and received a PhD in Applied Mathematics from Université Blaise Pascal (France) in 1989... Read More →
avatar for Haipeng Luo

Haipeng Luo

Haipeng Luo is an associate professor and IBM early career chair in the Thomas Lord Department of Computer Science at the University of Southern California. He obtained his PhD from Princeton University in 2016, following which he spent a year at Microsoft Research as a postdoctoral... Read More →
AA

Ahmet Alacaoglu

Name: Dr. Slothington "Slow Convergence" McNapface Title: Distinguished Professor of Continuous Optimization & Energy Minimization Affiliation: The Lush Canopy Institute of Sluggish Algorithms Bio: Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →
MT

Matthew Tam

Name: Dr. Slothington "Slow Convergence" McNapface Title: Distinguished Professor of Continuous Optimization & Energy Minimization Affiliation: The Lush Canopy Institute of Sluggish Algorithms Bio: Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →
YM

Yura Malitsky

Name: Dr. Slothington "Slow Convergence" McNapface Title: Distinguished Professor of Continuous Optimization & Energy Minimization Affiliation: The Lush Canopy Institute of Sluggish Algorithms Bio: Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →
Wednesday July 23, 2025 4:15pm - 5:30pm PDT
Taper Hall (THH) 208 3501 Trousdale Pkwy, 208, Los Angeles, CA 90089

Attendees (4)


Log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link