systems programming, deep learning, and the tools i build
A deep dive into PagedAttention, custom CUDA kernels, and what it takes to build a production-grade inference runtime in C++.