youtube.nixfred.com nixfred.com
Creator

3Blue1Brown

Grant Sanderson's channel of visually driven math and deep learning explainers, famous for animating hard ideas until they click.

1video

← All videos

27:14
3Blue1Brown

Transformers, the tech behind LLMs | Deep Learning Chapter 5

3Blue1Brown's visual introduction to how transformers, the T in GPT, actually work. Grant Sanderson follows one stream of data through the network: text is split into tokens, each token becomes a high dimensional vector via the embedding matrix, attention and multilayer perceptron blocks refine those vectors layer by layer, and a final unembedding plus softmax step turns the last vector into a probability guess for the next token. He grounds every idea in the real GPT-3 numbers, 175 billion parameters, 12,288 embedding dimensions, a 50,257 token vocabulary, and shows how directions in embedding space carry meaning and how temperature reshapes the output. It is the foundation chapter that sets up the later deep dive on attention.

AIScienceApr 1, 2024