Name: Parallel Sessions 3E: Efficient Optimization Methods for LLMs (Part II)
Start: 2025-07-21T16:15:00-0700
End: 2025-07-21T17:30:00-0700

Monday July 21, 2025 4:15pm - 5:30pm PDT

Taper Hall (THH) 212

Session: Efficient Optimization Methods for LLMs (Part II)
Chair: Xiao Li
Cluster: Optimization for Emerging Technologies (LLMs, Quantum Computing, ...)

Talk 1: On Memory-Efficient Block Coordinate Descent Methods for LLM Training
Speaker: Xiao Li
Abstract: This talk presents BAdam, an optimization method that leverages the block coordinate descent (BCD) framework with Adam's update rule. BAdam offers a memory- efficient approach to the full parameter finetuning of large language models. We conduct a theoretical convergence analysis for BAdam in the deterministic case. Experimentally, we apply BAdam to finetune the Llama 3-8B and Llama 3-70B models using a single RTX3090-24GB GPU and 4 A100-80GB GPUs, respectively. The results confirm BAdam's efficiency in terms of memory usage, running time, and optimization capability. Furthermore, the downstream performance evaluation based on MT-bench and math benchmarks shows that BAdam outperforms existing memory-efficient baselines such as LoRA. It also demonstrates that BAdam can achieve comparable or even superior performance compared to Adam. Finally, the ablation study using SGD's update rule illustrates the suitability of BCD for training LLMs.

Talk 2: Pre-training of an LLM at KAUST: Mistakes and Learned Lessons
Speaker: Francesco Orabona
Abstract: Due to the need of heavy computational resources, the process of pretraining of a large scale LLM is usually done only by big software companies. This also means that there are many phenomena and problems that very few academic people have had the chance to observe. At KAUST, we have attempted to pre-train an Arabic-English LLM, taking care of all the required steps, from data curation to tokenization, from the choice of the optimization algorithm to the appropriate evaluation procedure. In this talk, I'll give an overview of all the lessons we have learned throughout this process, underlining the open problems for the academic community.

Talk 3: Accelerating Neural Network Training: An Analysis of the AlgoPerf Competition
Speaker: Frank Schneider
Abstract: The goal of the AlgoPerf: Training Algorithms competition is to evaluate practical speed-ups in neural network training achieved solely by improving the underlying training algorithms. In the external tuning ruleset, submissions must provide workload- agnostic hyperparameter search spaces, while in the self-tuning ruleset they must be completely hyperparameter-free. In both rulesets, submissions are compared on time-to- result across multiple deep learning workloads, training on fixed hardware. This paper presents the inaugural AlgoPerf competition's results, which drew 18 diverse submissions from 10 teams. Our investigation reveals several key findings: (1) The winning submission in the external tuning ruleset, using Distributed Shampoo, demonstrates the effectiveness of non-diagonal preconditioning over popular methods like Adam, even when compared on wall-clock runtime. (2) The winning submission in the self-tuning ruleset, based on the Schedule Free AdamW algorithm, demonstrates a new level of effectiveness for completely hyperparameter-free training algorithms. (3) The top-scoring submissions were surprisingly robust to workload changes. We also discuss the engineering challenges encountered in ensuring a fair comparison between different training algorithms. These results highlight both the significant progress so far, and the considerable room for further improvements.

Speakers

Francesco Orabona

Name: Dr. Slothington "Slow Convergence" McNapface Title: Distinguished Professor of Continuous Optimization & Energy Minimization Affiliation: The Lush Canopy Institute of Sluggish Algorithms Bio: Dr. Slothington McNapface is a leading expert in continuous optimization, specializing... Read More →

Xiao Li

Assistant Professor, The Chinese University of Hong Kong, Shenzhen

Name: Dr. Xiao LiTitle: Assistant Professor of School of Data ScienceAffiliation: The Chinese University of Hong Kong, ShenzhenBio:I work on (continuous) optimization. Currently, my specific focus is on developing efficient optimization methods for large language models.

Frank Schneider

PostDoc, University of Tübingen

Name: Dr. Frank SchneiderTitle: Postdoctoral ResearcherAffiliation: University of Tübingen, GermanyBio:I’m a postdoctoral researcher at the University of Tübingen in the Methods of Machine Learning group, led by Philipp Hennig. I’m also a chair of the Algorithms working... Read More →

Monday July 21, 2025 4:15pm - 5:30pm PDT
Taper Hall (THH) 212 3501 Trousdale Pkwy, 212, Los Angeles, CA 90089

Parallel Session

ICCOPT2025USC

Francesco Orabona

Xiao Li

Frank Schneider

Attendees (5)

Get help with the event

ICCOPT2025USC

Francesco Orabona

Xiao Li

Frank Schneider

Attendees (5)

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event