Name: Parallel Sessions 1F: Optimization as the Engine of Generative AI - I
Start: 2025-07-21T10:30:00-0700
End: 2025-07-21T11:45:00-0700

Monday July 21, 2025 10:30am - 11:45am PDT

Joseph Medicine Crow Center for International and Public Affairs (DMC) 156

Session: Optimization as the Engine of Generative AI - I
Chair: Yinbin Han
Cluster: Optimization for Emerging Technologies (LLMs, Quantum Computing, ...)

Talk 1: InfAlign: Inference-aware language model alignment
Speaker: Theertha Suresh
Abstract: Language model alignment is a critical step in training modern generative language models. Alignment targets to improve the win rate of a sample from the aligned model against the base model. Today, we are increasingly using inference-time algorithms (e.g., Best-of-N, controlled decoding, tree search) to decode from language models rather than standard sampling. In this talk, we will first overview different inference-time algorithms and the standard RLHF procedure. We then show that this train/test mismatch makes the standard RLHF framework sub-optimal in view of such inference-time methods. To this end, we propose a framework for inference-aware alignment (InfAlign), which aims to optimize inference-time win rate of the aligned policy against the base model. We prove that for any inference-time decoding procedure, the optimal aligned policy is the solution to the standard RLHF problem with a transformation of the reward. This motivates us to provide the calibrate-and-transform RL (InfAlign-CTRL) algorithm to solve this problem, which involves a reward calibration step and a KL-regularized reward maximization step with a transformation of the calibrated reward. For best-of-N sampling and best-of-N jailbreaking, we propose specific transformations offering up to 3-8% improvement on inference-time win rates. Finally, we also show that our proposed reward calibration method is a strong baseline for optimizing standard win rate.

Talk 2: LLMs for MILP Solver Configuration
Speaker: Connor Lawless
Abstract: Mixed integer linear programming (MILP) solvers ship with a staggering number of parameters that are challenging to select a priori for all but expert optimization users, but can have an outsized impact on the performance of the MILP solver. We introduce a new LLM-based framework to configure which cutting plane separators to use for a given MILP problem with little to no training data based on characteristics of the instance, such as a natural language description of the problem and the associated LaTeX formulation. Our LLM-based methodology requires no custom solver interface, can find a high-performing configuration by solving only a small number of MILPs, and can generate the configuration with simple API calls that run in under a second.

Talk 3: Stochastic Control for Fine-tuning Diffusion Models: Optimality, Regularity, and Convergence
Speaker: Yinbin Han
Abstract: Diffusion models have emerged as powerful tools for generative modeling, demonstrating exceptional capability in capturing target data distributions from large datasets. However, fine-tuning these massive models for specific downstream tasks, constraints, and human preferences remains a critical challenge. While recent advances have leveraged reinforcement learning algorithms to tackle this problem, much of the progress has been empirical, with limited theoretical understanding. To bridge this gap, we propose a stochastic control framework for fine-tuning diffusion models. Building on denoising diffusion probabilistic models as the pre-trained reference dynamics, our approach integrates linear dynamics control with Kullback-Leibler regularization. We establish the well-posedness and regularity of the stochastic control problem and develop a policy iteration algorithm (PI-FT) for numerical solution. We show that PI-FT achieves global convergence at a linear rate. Unlike existing work that assumes regularities throughout training, we prove that the control and value sequences generated by the algorithm maintain the regularity. Additionally, we explore extensions of our framework to parametric settings and continuous-time formulations.

Speakers

ICCOPT2025USC

Theertha Suresh

Connor Lawless

Yinbin Han

Attendees (3)

Get help with the event