Training vs Inference: Two Modes
The Forward Pass
You've seen the transformer architecture: embeddings flow through attention and feed-forward blocks, then the LM head produces predictions. This flow from input to output is called the forward pass.
Every time you ask ChatGPT a question, it runs this forward pass. Your text goes in, predictions come out. The weights (all those learned numbers in the matrices) stay frozen - they don't change.
This is inference mode: using a trained model to make predictions.
But Where Do the Weights Come From?
The weights weren't always useful. Before training, they're random numbers. A random transformer produces garbage - it might predict "banana" after "the cat sat on the".
Training is how those random weights become useful pattern detectors. The model processes billions of text examples, compares its predictions to actual next words, and gradually adjusts its weights to make better predictions.
After billions of examples, the random weights transform into the patterns that make GPT-4 useful.
Training Adds the Backward Pass
Training needs more than the forward pass. After each prediction, it must figure out how to improve.
The backward pass (backpropagation) computes gradients - essentially asking "if I nudge this weight up or down, does the error get better or worse?" This requires storing intermediate values from the forward pass and running calculations backward through every layer.
Inference skips all of this. Weights are frozen. Just run forward and output the prediction.
Why This Course Focuses on Inference
Understanding the forward pass is the foundation. It's where you learn what each component actually does:
- How embeddings convert words to vectors
- How attention finds relationships between words
- How feed-forward networks transform representations
- How the LM head produces vocabulary scores
- How softmax converts scores to probabilities
Training adds machinery on top of the forward pass (gradients, optimization, loss functions), but the core computation is the same. Once you understand how data flows forward through the transformer, you understand what the model is doing - whether in training or inference.
This course teaches the forward pass in detail. By Module 4, you'll understand every step from "the cat" to predicting "sat". That's the complete inference pipeline - and the foundation for understanding training if you go deeper.
Our Pre-Trained Tiny Model
We'll work with a tiny GPT that's already trained. The weights are learned and frozen - we're purely in inference mode. You'll load these weights, run the forward pass, and see exactly how predictions emerge from the architecture.
The training happened beforehand on simple English sentences. What you'll study is how those trained weights process new inputs to make predictions. That's inference - and that's what this course teaches.