Loading…
Session: Advances in Modeling and Optimization for MDP and Optimal Control
Chair: Yan Li; Minda Zhao
Cluster: Optimization Under Uncertainty and Data-driven Optimization

Talk 1: Beyond absolute continuity: a new class of dynamic risk measures
Speaker: Jincheng Yang
Abstract: The modern theory of risk measures copes with uncertainty by considering multiple probability measures. While it is often assumed that a reference probability measure exists, under which all relevant probability measures are absolutely continuous, there are examples where this assumption does not hold, such as certain distributional robust functionals. In this talk, we introduce a novel class of dynamic risk measures that do not rely on this assumption. We will discuss its convexity, coherence, and time consistency properties.

Talk 2: Finite‑Time Bounds for Distributionally Robust TD Learning with Linear Function Approximation
Speaker: Yashaswini Murthy
Abstract: Distributionally robust reinforcement learning (DRRL) focuses on designing policies that achieve high performance under model uncertainties. Existing convergence guarantees for robust temporal‑difference (TD) learning are either purely asymptotic, limited to tabular MDPs, dependent on restrictive discount‑factor or uncertainty‑set assumptions when function approximation is used, or rely on a generative model.
This work relaxes these assumptions by presenting the first finite‑time sample‑complexity bound for robust TD learning with linear function approximation in the discounted‑reward setting, covering total‑variation and Wasserstein‑$p$ uncertainty sets without requiring generative access. Our algorithm combines a two‑time‑scale stochastic‑approximation update with an outer‑loop target‑network buffer, leveraging the sup‑norm contraction of the robust Bellman operator to stabilize updates that are not contractions in the weighted Euclidean norm. For both uncertainty models we establish an $\tilde{O}(1/\epsilon^{2})$ sample complexity to obtain an $\epsilon$-accurate value estimate.

Talk 3: Landscape of Policy Optimization for Finite Horizon MDPs with General State and Action
Speaker: Minda Zhao
Abstract: Policy gradient methods are widely used in reinforcement learning. Yet, the nonconvexity of policy optimization imposes significant challenges in understanding the global convergence of policy gradient methods. For a class of finite-horizon Markov Decision Processes (MDPs) with general state and action spaces, we develop a framework that provides a set of easily verifiable assumptions to ensure the Kurdyka-Łojasiewicz (KŁ) condition of the policy optimization. Leveraging the KŁ condition, policy gradient methods converge to the globally optimal policy with a non-asymptomatic rate despite nonconvexity. Our results find applications in various control and operations models, including entropy-regularized tabular MDPs, Linear Quadratic Regulator (LQR) problems, stochastic inventory models, and stochastic cash balance problems, for which we show an $\epsilon$-optimal policy can be obtained using a sample size in $\tilde{\co}(\epsilon^{-1})$ and polynomial in terms of the planning horizon by stochastic policy gradient methods. Our result establishes the first sample complexity for multi-period inventory systems with Markov-modulated demands and stochastic cash balance problems in the literature.

Speakers
JY

Jincheng Yang

Name: Dr. Slothington "Slow Convergence" McNapface Title: Distinguished Professor of Continuous Optimization & Energy Minimization Affiliation: The Lush Canopy Institute of Sluggish Algorithms Bio: Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →
YM

Yashaswini Murthy

UIUC
Name: Yashaswini Murthy Title: PhD Candidate Affiliation: University of Illinois Urbana-Champaign Bio: Yashaswini Murthy is a PhD candidate in the Electrical and Computer Engineering department at the University of Illinois Urbana-Champaign (UIUC), where she is a recipient of... Read More →
YL

Yan Li

Name: Dr. Slothington "Slow Convergence" McNapface Title: Distinguished Professor of Continuous Optimization & Energy Minimization Affiliation: The Lush Canopy Institute of Sluggish Algorithms Bio: Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →
MZ

Minda Zhao

Name: Dr. Slothington "Slow Convergence" McNapface Title: Distinguished Professor of Continuous Optimization & Energy Minimization Affiliation: The Lush Canopy Institute of Sluggish Algorithms Bio: Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →
Thursday July 24, 2025 10:30am - 11:45am PDT
Joseph Medicine Crow Center for International and Public Affairs (DMC) 154 3518 Trousdale Pkwy, 154, Los Angeles, CA 90089

Attendees (1)


Log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link