Session: Data-Driven Decision and Control
Chair: Guannan Qu
Cluster: Optimization For Data Science
Talk 1: Robustness in data-driven control: Requirement? An opportunity
Speaker: Laixi Shi
Abstract: Reinforcement learning (RL), which strives to learn desirable sequential decisions based on trial-and-error interactions with an unknown environment, has achieved remarkable success recently in a variety of domains including games and large language model alignment. While standard RL has been heavily investigated recently, a policy learned in an ideal, nominal environment might fail catastrophically when the deployed environment is subject to small changes in task objectives or adversarial perturbations, especially in high-stake applications such as robotics and clinical trials. This talk concerns the central issues of sample efficiency and model robustness in reinforcement learning (RL) to reduce the sim-to-real gap in practice. We adopt the framework of distributionally robust Markov decision processes (RMDPs), aimed at learning a policy that optimizes the worst-case performance when the deployed environment falls within a prescribed uncertainty set around the nominal MDP. Despite recent efforts, the sample complexity of RMDPs remained mostly unsettled regardless of the uncertainty set in use. It was unclear if distributional robustness bears any statistical consequences when benchmarked against standard RL. Somewhat surprisingly, our results uncover that RMDPs are not necessarily easier or harder to learn than standard MDPs. The statistical consequence incurred by the robustness requirement depends heavily on the size and shape of the uncertainty set. In addition, we break down the sample barrier of robust RL in offline setting by providing the first provable near-optimal algorithm for offline robust RL that can learn under simultaneous model uncertainty and limited historical datasets.
Talk 2: Learning and Control in Countable State Spaces
Speaker: Rayadurgam Srikant
Abstract: We will consider policy optimization methods in reinforcement learning where the state space is countably infinite. The motivation arises from control problems in communication networks and matching markets. Specifically, we consider the popular Natural Policy Gradient (NPG) algorithm, which has been studied in the past only under the assumptions that the cost is bounded and the state space is finite, neither of which holds for the aforementioned control problems. Assuming a Lyapunov drift condition, which is naturally satisfied in some cases and can be satisfied in other cases at a small cost in performance, we design a state-dependent step-size rule which dramatically improves the performance of NPG for our intended applications. In addition to experimentally verifying the performance improvement, we also theoretically show that the iteration complexity of NPG can be made independent of the size of the state space. The key analytical tool we use is the connection between NPG stepsizes and the solution to Poisson’s equation. In particular, we provide policy-independent bounds on the solution to Poisson’s equation, which are then used to guide the choice of NPG stepsizes.
Talk 3: Distributionally Robust Control via Optimal Transport
Speaker: Liviu Aolaritei
Abstract: In this talk I will challenge the standard uncertainty models, i.e., robust (norm-bounded) and stochastic (one fixed distribution, e.g., Gaussian), and propose to model uncertainty in dynamical systems via Optimal Transport (OT) ambiguity sets. I will then show that OT ambiguity sets are analytically tractable: they propagate easily and intuitively through linear and nonlinear (possibly corrupted by noise) maps, and the result of the propagation is again an OT ambiguity set or can be tightly upper bounded by one. In the context of dynamical systems, this allows to consider multiple sources of uncertainty (e.g., initial condition, additive noise, multiplicative noise) and to capture in closed-form, via an OT ambiguity set, the resulting uncertainty in the state at any future time. The resulting OT ambiguity sets are also computationally tractable, and can be directly employed in various distributionally robust control formulations that can optimally trade between safety and performance.