Name: Parallel Sessions 4I: Optimization for Large Language Models and Kernels
Start: 2025-07-22T10:30:00-0700
End: 2025-07-22T11:45:00-0700

Tuesday July 22, 2025 10:30am - 11:45am PDT

Joseph Medicine Crow Center for International and Public Affairs (DMC) 100

Session: Optimization for Large Language Models and Kernels
Chair: Ming Yin
Cluster: Optimization Applications (Communication, Energy, Health, ML, ...)

Talk 1: Optimizing for a Proxy Reward in RLHF
Speaker: Banghua Zhu
Abstract: Reinforcement Learning from Human Feedback (RLHF) has become an important technique in post-training of Larger Language Models (LLM). During RLHF, one usually first trains a reward model from human preference data, and then optimizes the LLM for the proxy reward signal predicted by the reward model. In this talk, I'll discuss what makes a good reward model for RLHF from both theoretical and empirical observations.

Talk 2: Self-Play Preference Optimization for Language Model Alignment
Speaker: Yue Wu
Abstract: In this paper, we propose a self-play-based method for language model alignment, which treats the problem as a constant-sum two-player game aimed at optimizing the model to approximate the Nash equilibrium. Our approach, dubbed SPPO, is based on a new alignment objective derived from L2 regression. Interestingly, this new objective has a deep connection with the KL-regularized policy gradient and natural gradient methods, and can guarantee the convergence to the optimal solution. In our experiments, this theoretically motivated objective turns out highly effective. By leveraging a small pre-trained preference model, SPPO can obtain a highly-aligned model without additional external supervision from human or stronger language models.

Talk 3: Learning Counterfactual Distributions via Kernel Nearest Neighbors
Speaker: Kyuseong Choi
Abstract: Consider a setting with multiple units (e.g., individuals, cohorts, geographic locations) and outcomes (e.g., treatments, times, items), where the goal is to learn a multivariate distribution for each unit-outcome entry, such as the distribution of a user's weekly spend and engagement under a specific mobile app version. A common challenge is the prevalence of missing not at random data---observations are available only for certain unit-outcome combinations---where the observed distributions can be correlated with properties of distributions themselves, i.e., there is unobserved confounding. An additional challenge is that for any observed unit-outcome entry, we only have a finite number of samples from the underlying distribution. We tackle these two challenges by casting the problem into a novel distributional matrix completion framework and introduce a kernel-based distributional generalization of nearest neighbors to estimate the underlying distributions. By leveraging maximum mean discrepancies and a suitable factor model on the kernel mean embeddings of the underlying distributions, we establish consistent recovery of the underlying distributions even when data is missing not at random and positivity constraints are violated. Furthermore, we demonstrate that our nearest neighbors approach is robust to heteroscedastic noise, provided we have access to two or more measurements for the observed unit-outcome entries—a robustness not present in prior works on nearest neighbors with single measurements.

Speakers

Banghua Zhu

Name: Dr. Slothington "Slow Convergence" McNapface Title: Distinguished Professor of Continuous Optimization & Energy Minimization Affiliation: The Lush Canopy Institute of Sluggish Algorithms Bio: Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →

Kyuseong Choi

PhD, Cornell Tech

Yue Wu

PhD Student, Johns Hopkins University

Name: Dr. Slothington "Slow Convergence" McNapfaceTitle: Distinguished Professor of Continuous Optimization & Energy MinimizationAffiliation: The Lush Canopy Institute of Sluggish AlgorithmsBio:Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →

Ming Yin

Tuesday July 22, 2025 10:30am - 11:45am PDT
Joseph Medicine Crow Center for International and Public Affairs (DMC) 100 3518 Trousdale Pkwy, 100, Los Angeles, CA 90089

Parallel Session

ICCOPT2025USC

Banghua Zhu

Kyuseong Choi

Yue Wu

Ming Yin

Attendees (1)

Get help with the event

ICCOPT2025USC

Banghua Zhu

Kyuseong Choi

Yue Wu

Ming Yin

Attendees (1)

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event