NanoGPT-inference: How to build LLM inference from scratch

Learn to build efficient and economical LLM inference from scratch. This talk reveals techniques to speed up inference, with code extensions to NanoGPT.

Overview

LLM inference, the engineering behind serving LLMs efficiently and economically, is becoming increasingly important. In this post, I’ll show you how to speed up LLM inference with various techniques. I also release the code of each inference engine as a simple extension to Karpathy’s NanoGPT.

Links

Tech stack