The Ultra-Scale Talk: Scaling Training to Thousands of GPUs
May 7
•
14:40 - 15:00
Location: Central Room (Updated)
Training large language models (LLMs) demands more than just raw compute—it requires infrastructure, strategy, and a deep understanding of parallelism. What begins as a single-GPU prototype must eventually scale across thousands of devices, each step introducing new complexity.
This talk dives into the practicalities of ultra-scale training. We'll explore how 5D parallelism—spanning data, tensor, pipeline, context, and expert dimensions—makes it possible to stretch a single training run across massive GPU clusters. Along the way, we’ll cover performance tuning, communication patterns, and architecture choices that impact throughput and hardware efficiency.
A key reference for this session is the Ultra-Scale Playbook, which distills best practices and hard-earned lessons from real-world LLM scaling efforts. We’ll walk through highlights of the playbook, tying them into case studies, benchmarks, and hands-on recommendations.
Scaling isn’t just about size—it’s about doing more with what you have. This webinar offers a comprehensive look at what it really takes to train state-of-the-art models at scale, designed for engineers, researchers, and practitioners ready to move beyond “it fits on one GPU” toward infrastructure that powers trillion-parameter models—efficiently, and at speed.