Episode notes
Learn how to demystify large language models by building GPT-2 from scratch — in a spreadsheet. In this episode, MIT engineer Ishan Anand breaks down the inner workings of transformers in a way that’s visual, interactive, and beginner-friendly, yet deeply technical for experienced builders.
What you’ll learn:
• How GPT-2 became the architectural foundation for modern LLMs like ChatGPT, Claude, Gemini, and LLaMA.
• The three major innovations since GPT-2 — mixture of experts, RoPE (rotary position embeddings), and advances in training — and how they changed AI performance.
• A clear explanation of tokenization, attention, and transformer blocks that you can see and manipulate in real time.
• How to implement GPT-2’s core in ~600 lines of code and why that understanding makes you a better AI builder.
• The ...