PyTorch Day France

Advancing Mamba in PyTorch

May 7

•

16:00 - 16:20

Location: Central Room (Updated)

Mamba layers are efficient alternatives to standard attention: their training complexity is linear in sequence length, while inference is sequence-length-independent and only requires a small cache. I will discuss a selection of IBM's ongoing work in advancing the state of mamba training in pytorch, including: context-parallel training for long-sequence data, mamba + mixture-of-expert support with expert parallelism, torch-native associative scan ops, and improved DTensor op support.

Speakers

Garrett Goon

Research Scientist at IBM