✨ Understand What's Happening At The Frontier of Agentic AI - GOSIM Paris 2026 - May 5-6, 2026 ✨
Filter

PyTorch Day France

Advancing Mamba in PyTorch

May 7

16:00 - 16:20

Location: Central Room (Updated)

Mamba layers are efficient alternatives to standard attention: their training complexity is linear in sequence length, while inference is sequence-length-independent and only requires a small cache. I will discuss a selection of IBM's ongoing work in advancing the state of mamba training in pytorch, including: context-parallel training for long-sequence data, mamba + mixture-of-expert support with expert parallelism, torch-native associative scan ops, and improved DTensor op support.

Speakers