Writings

systems programming, deep learning, and the tools i build

Building an LLM Inference Engine from Scratch

A deep dive into PagedAttention, custom CUDA kernels, and what it takes to build a production-grade inference runtime in C++.

Mar 10, 2026 c++, cuda, inference, systems