Writings

systems programming, deep learning, and the tools i build

A deep dive into PagedAttention, custom CUDA kernels, and what it takes to build a production-grade inference runtime in C++.