Name: Parallel Sessions 7G: Statistical and Computational Aspects of Multi-Agent Games
Start: 2025-07-23T10:30:00-0700
End: 2025-07-23T11:45:00-0700

Wednesday July 23, 2025 10:30am - 11:45am PDT

Taper Hall (THH) 114

Session: Statistical and Computational Aspects of Multi-Agent Games
Chair: Zhuoran Yang
Cluster: Multi-agent Optimization and Games

Talk 1: Partially Observable Multi-Agent Reinforcement Learning with Information Sharing
Speaker: Kaiqing Zhang
Abstract: We study provable multi-agent reinforcement learning (RL) in the general framework of partially observable stochastic games (POSGs). To circumvent the known hardness results and the use of computationally intractable oracles, we advocate leveraging the potential information-sharing among agents, a common practice in empirical multi-agent RL, and a standard model for multiagent control systems with communications. We first establish several computational complexity results to justify the necessity of information-sharing, as well as the observability assumption that has enabled quasi-efficient single-agent RL with partial observations, for efficiently solving POSGs. Inspired by the inefficiency of planning in the ground-truth model, we then propose to further approximate the shared common information to construct an approximate model of the POSG, in which planning an approximate equilibrium (in terms of solving the original POSG) can be quasi-efficient, i.e., of quasi-polynomial-time, under the aforementioned assumptions. Furthermore, we develop a partially observable multi-agent RL algorithm that is both statistically and computationally quasi-efficient. Finally, beyond equilibrium learning, we extend our algorithmic framework to finding the team-optimal solution in cooperative POSGs, i.e., decentralized partially observable Markov decision processes, a much more challenging goal. We establish concrete computational and sample complexities under several common structural assumptions of the model. We hope our study could open up the possibilities of leveraging and even designing different information structures, a well-studied notion in control theory, for developing both sample- and computation-efficient partially observable multi-agent RL.

Talk 2: Faster Rates for No-Regret Learning in General Games via Cautious Optimism
Speaker: Ashkan Soleymani
Abstract: Uncoupled learning dynamics for multiagent interactions (“games”) define iterative update rules that each agent can apply repeatedly to improve their strategy. A celebrated result establishes that for several choices of learning dynamics, global notions of equilibrium in the system will arise. This connection runs deep and is far from trivial: equilibrium emerges even despite formally chaotic behavior. Today, learning dynamics are the most scalable technique for equilibrium computation in large-scale, general games.

Talk 3: Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms
Speaker: Weiqiang Zheng
Abstract: Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games, both in theory and practice. Particularly popular algorithms include optimistic multiplicative weights update (OMWU) and optimistic gradient-descent-ascent (OGDA). While both algorithms enjoy $O(1/T)$ ergodic convergence to Nash equilibrium in two-player zero-sum games, OMWU offers several advantages including logarithmic dependence on the size of the payoff matrix and $\Tilde{O}(1/T)$ convergence to coarse correlated equilibria even in general-sum games. However, in terms of last-iterate convergence in two-player zero-sum games, an increasingly popular topic in this area, OGDA guarantees that the duality gap shrinks at a rate of $(1/\sqrt{T})$, while the best existing last-iterate convergence for OMWU depends on some game-dependent constant that could be arbitrarily large. This begs the question: **is this potentially slow last-iterate convergence an inherent disadvantage of OMWU, or is the current analysis too loose?** Somewhat surprisingly, we show that the former is true. More generally, we prove that a broad class of algorithms that do not forget the past quickly all suffer the same issue: for any arbitrarily small $\delta>0$, there exists a $2\times 2$ matrix game such that the algorithm admits a constant duality gap even after $1/\delta$ rounds. This class of algorithms includes OMWU and other standard optimistic follow-the-regularized-leader algorithms.

Speakers

Kaiqing Zhang

Name: Dr. Slothington "Slow Convergence" McNapface Title: Distinguished Professor of Continuous Optimization & Energy Minimization Affiliation: The Lush Canopy Institute of Sluggish Algorithms Bio: Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →

Weiqiang Zheng

PhD candidate, Yale University

Weiqiang Zheng is a fourth-year computer science PhD student at Yale University, advised by Prof. Yang Cai. He received his bachelor's degree from Turing Class, Peking University. His has a broad interest in algorithmic game theory, online learning, and optimization. His recent research... Read More →

Zhuoran Yang

Ashkan Soleymani

Wednesday July 23, 2025 10:30am - 11:45am PDT
Taper Hall (THH) 114 3501 Trousdale Pkwy, 114, Los Angeles, CA 90089

Parallel Session

ICCOPT2025USC

Kaiqing Zhang

Weiqiang Zheng

Zhuoran Yang

Ashkan Soleymani

Attendees (3)

Get help with the event

ICCOPT2025USC

Kaiqing Zhang

Weiqiang Zheng

Zhuoran Yang

Ashkan Soleymani

Attendees (3)

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event