Loading…
Wednesday July 23, 2025 10:30am - 11:45am PDT
Session: Robust optimization for sequential decision-making
Chair: Julien Grand-Clément
Cluster: Optimization Under Uncertainty and Data-driven Optimization

Talk 1: Statistical Learning of Distributionally Robust Stochastic Control in Continuous State Spaces
Speaker: Shengbo Wang
Abstract: We explore the control of stochastic systems with potentially continuous state and action spaces, characterized by the state dynamics $X_{t+1} = f(X_t, A_t, W_t)$. Here, $X$, $A$, and $W$ represent the state, action, and exogenous random noise processes, respectively, with $f$ denoting a known function that describes state transitions. Traditionally, the noise process $\set{W_t, t \geq 0}$ is assumed to be independent and identically distributed, with a distribution that is either fully known or can be consistently estimated. However, the occurrence of distributional shifts, typical in engineering settings, necessitates the consideration of the robustness of the policy. This paper introduces a distributionally robust stochastic control paradigm that accommodates possibly adaptive adversarial perturbation to the noise distribution within a prescribed ambiguity set. We examine two adversary models: current-action-aware and current-action-unaware, leading to different dynamic programming equations. Furthermore, we characterize the optimal finite sample minimax rates for achieving uniform learning of the robust value function across continuum states under both adversary types, considering ambiguity sets defined by $f_k$-divergence and Wasserstein distance. Finally, we demonstrate the applicability of our framework across various real-world settings.

Talk 2: Robust Regret Minimization in Bayesian Offline Bandits
Speaker: Marek Petrik
Abstract: We study how to make decisions that minimize Bayesian regret in offline linear bandits. Prior work suggests that one must take actions with maximum lower confidence bound (LCB) on their reward. We argue that the reliance on LCB is inherently flawed in this setting and propose a new algorithm that directly minimizes upper bounds on the Bayesian regret using efficient conic optimization solvers. Our bounds build heavily on new connections to monetary risk measures. Proving a matching lower bound, we show that our upper bounds are tight, and by minimizing them we are guaranteed to outperform the LCB approach. Our numerical results on synthetic domains confirm that our approach is superior to LCB.

Talk 3: On the interplay between average and discounted optimality in robust Markov decision processes
Speaker: Julien Grand-Clément
Abstract: We introduce the Blackwell discount factor for Markov Decision Processes (MDPs). Classical objectives for MDPs include discounted, average, and Blackwell optimality. Many existing approaches to computing average-optimal policies solve for discount-optimal policies with a discount factor close to 1, but they only work under strong or hard-to-verify assumptions on the MDP structure such as unichain or ergodicity. We are the first to highlight the shortcomings of the classical definition of Blackwell optimality, which does not lead to simple algorithms for computing Blackwell-optimal policies and overlooks the pathological behaviors of optimal policies as regards the discount factors. To resolve this issue, in this paper, we show that when the discount factor is larger than the Blackwell discount factor, all discount-optimal policies become Blackwell-and average-optimal, and we derive a general upper bound on the Blackwell discount factor. Our upper bound on the Blackwell discount factor, parametrized by the bit-size of the rewards and transition probabilities of the MDP instance, provides the first reduction from average and Blackwell optimality to discounted optimality, without any assumptions, along with new polynomial-time algorithms. Our work brings new ideas from polynomials and algebraic numbers to the analysis of MDPs. Our results also apply to robust MDPs, enabling the first algorithms to compute robust Blackwell-optimal policies.

Speakers
SW

Shengbo Wang

Name: Dr. Slothington "Slow Convergence" McNapface Title: Distinguished Professor of Continuous Optimization & Energy Minimization Affiliation: The Lush Canopy Institute of Sluggish Algorithms Bio: Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →
MP

Marek Petrik

Name: Marek PetrikTitle: Associate ProfessorAffiliation: University of New HampshireBio:Marek Petrik is an associate professor of Computer Science at the University of New Hampshire. Until 2016, he was a research staff member at the Mathematical Sciences Department of IBM’s T. J... Read More →
JG

Julien Grand-Clément

Name: Dr. Slothington "Slow Convergence" McNapface Title: Distinguished Professor of Continuous Optimization & Energy Minimization Affiliation: The Lush Canopy Institute of Sluggish Algorithms Bio: Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →
Wednesday July 23, 2025 10:30am - 11:45am PDT
Taper Hall (THH) 212 3501 Trousdale Pkwy, 212, Los Angeles, CA 90089

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link