Loading…
Monday July 21, 2025 10:30am - 11:45am PDT
Session: Robust decision making in dynamic environments
Chair: Tobias Sutter
Cluster: Optimization Under Uncertainty and Data-driven Optimization

Talk 1: Towards Optimal Offline Reinforcement Learning
Speaker: Mengmeng Li
Abstract: We study offline reinforcement learning problems with a long-run average reward objective. The state-action pairs generated by any fixed behavioral policy thus follow a Markov chain, and the empirical state-action-next-state distribution satisfies a large deviations principle. We use the rate function of this large deviations principle to construct an uncertainty set for the unknown true state-action-next-state distribution. We also construct a distribution shift transformation that maps any distribution in this uncertainty set to a state-action-next-state distribution of the Markov chain generated by a fixed evaluation policy, which may differ from the unknown behavioral policy. We prove that the worst-case average reward of the evaluation policy with respect to all distributions in the shifted uncertainty set provides, in a rigorous statistical sense, the least conservative estimator for the average reward under the unknown true distribution. This guarantee is available even if one has only access to one single trajectory of serially correlated state-action pairs. The emerging robust optimization problem can be viewed as a robust Markov decision process with a non-rectangular uncertainty set. We adapt an efficient policy gradient algorithm to solve this problem. Numerical experiments show that our methods compare favorably against state-of-the-art methods.

Talk 2: Optimal Decision Making in Abstract Stochastic Environments
Speaker: Radek Salac
Abstract: Given data from an abstract stochastic process, we study how to construct an optimal decision for a general stochastic optimization problem in a statistically optimal manner. Our approach seeks to identify decisions, where the corresponding risk of the shifted regret, evaluated on the underlying stochastic data process, converges to zero at a specified exponential rate under a minimal shift. This optimal decision emerges as a solution to a specific multi-objective optimization problem driven by the properties of the data-generating process, particularly by a corresponding rate function—a generalization of the well-known concept from large deviation theory. Moreover, the regret of such decision itself is proven to converge to zero, providing a notion of consistency. Our findings are established within a highly abstract framework, of which the above interpretation is a mere instance. The characterization of risk, crucial to our main results, is grounded in the concept of a so-called maxitive integral and its properties, which resemble the less universal Varadhan Lemma in the context of asymptotic relative entropy. Several well-known results from the literature on data-driven decision-making under uncertainty can be recovered as special cases within our general framework.

Talk 3: Revisiting Model Selection for Math Programming-Based Approximate Dynamic Programming
Speaker: Negar Soheili
Abstract: Approximations of Markov decision processes are widely used to develop policies for large-scale sequential decision-making problems. Math-programming-based approximate dynamic programming (ADP) methods, such as approximate linear programming (ALP) and pathwise optimization (PO), are notable for providing both policies and upper bounds on policy performance. ALP and PO typically solve large-scale linear or convex programming models, with recent advances in first-order methods improving solvability. Traditionally, ADP models are compared, assuming fixed features and that models can be solved optimally, in which case PO outperforms ALP. However, with machine learning techniques like random features, both ALP and PO become asymptotically optimal as more features are added. We revisit model selection between ALP and PO under random features and introduce a new ALP relaxation that improves both quality and solvability compared to the original ALP for fixed features. When no clear dominance exists between the ALP relaxation and PO models, solution methods and model structure become critical. Our ALP relaxation has a separability structure, making it preferable to PO both theoretically and numerically when using block-coordinate descent, the state-of-the-art method for solving PO. These findings offer new insights into model selection for math-programming-based ADP, where feature architecture, model, and solution method must be considered collectively, which is potentially relevant beyond these specific approaches.

Speakers
TS

Tobias Sutter

Name: Dr. Slothington "Slow Convergence" McNapface Title: Distinguished Professor of Continuous Optimization & Energy Minimization Affiliation: The Lush Canopy Institute of Sluggish Algorithms Bio: Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →
ML

Mengmeng Li

Name: Dr. Slothington "Slow Convergence" McNapface Title: Distinguished Professor of Continuous Optimization & Energy Minimization Affiliation: The Lush Canopy Institute of Sluggish Algorithms Bio: Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →
RS

Radek Salac

Doctoral Student, University of Konstanz
Name: Radek SalacTitle: Doctoral Student at Chair of Machine Learning and OptimizationAffiliation: University of Konstanz
Monday July 21, 2025 10:30am - 11:45am PDT
Taper Hall (THH) 114 3501 Trousdale Pkwy, 114, Los Angeles, CA 90089

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link