WGML: the Story of Building a New High-performance, Cross-platform, On-device Inference Framework.
May 6
16:20 - 17:00
Ever dreamed of writing a new low-level LLM inference library? Check out the making of WGML, an open-source high-performance cross-platform GPU inference framework using Rust and WebGPU. We will cover tips for discovering and implementing LLM inference.