Training vs Inference: Two Modes

The Forward Pass

You've seen the transformer architecture: embeddings flow through attention and feed-forward blocks, then the LM head produces predictions. This flow from input to output is called the forward pass.

Forward Pass

Every time you ask ChatGPT a question, it runs this forward pass. Your text goes in, predictions come out. The weights (all those learned numbers in the matrices) stay frozen - they don't change.

This is inference mode: using a trained model to make predictions.

But Where Do the Weights Come From?

The weights weren't always useful. Before training, they're random numbers. A random transformer produces garbage - it might predict "banana" after "the cat sat on the".

Training is how those random weights become useful pattern detectors. The model processes billions of text examples, compares its predictions to actual next words, and gradually adjusts its weights to make better predictions.

Training Loop

After billions of examples, the random weights transform into the patterns that make GPT-4 useful.

Training Adds the Backward Pass

Training needs more than the forward pass. After each prediction, it must figure out how to improve.

Training vs Inference Modes

The backward pass (backpropagation) computes gradients - essentially asking "if I nudge this weight up or down, does the error get better or worse?" This requires storing intermediate values from the forward pass and running calculations backward through every layer.

Inference skips all of this. Weights are frozen. Just run forward and output the prediction.

Why This Course Focuses on Inference

Understanding the forward pass is the foundation. It's where you learn what each component actually does:

How embeddings convert words to vectors
How attention finds relationships between words
How feed-forward networks transform representations
How the LM head produces vocabulary scores
How softmax converts scores to probabilities

Training adds machinery on top of the forward pass (gradients, optimization, loss functions), but the core computation is the same. Once you understand how data flows forward through the transformer, you understand what the model is doing - whether in training or inference.

This course teaches the forward pass in detail. By Module 4, you'll understand every step from "the cat" to predicting "sat". That's the complete inference pipeline - and the foundation for understanding training if you go deeper.

Our Pre-Trained Tiny Model

We'll work with a tiny GPT that's already trained. The weights are learned and frozen - we're purely in inference mode. You'll load these weights, run the forward pass, and see exactly how predictions emerge from the architecture.

The training happened beforehand on simple English sentences. What you'll study is how those trained weights process new inputs to make predictions. That's inference - and that's what this course teaches.