Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
NanoGPT: Speeding Up LLM Inference
Learn to build efficient and economical LLM inference from scratch. This talk reveals techniques to speed up inference, with code extensions to NanoGPT.
LLM inference, the engineering behind serving LLMs efficiently and economically, is becoming increasingly important. In this post, I’ll show you how to speed up LLM inference with various techniques. I also release the code of each inference engine as a simple extension to Karpathy’s NanoGPT.