What is an LLM?

Text In, Text Out

Type "The capital of France is" into ChatGPT and it responds: "Paris." Type "Once upon a time" and it continues the story. Type a question and it answers. These models take text as input and produce text as output.

This behavior looks like intelligence, like the model understands what you're asking. However, understanding what's actually happening reveals something simpler and more fascinating: the model is predicting the next most likely word, over and over again.

LLM Black Box

Predicting the Next Word

An LLM's core function is next-word prediction. Given the text "The cat sat on the", the model outputs probabilities for what comes next:

"mat" (high probability)
"floor" (high probability)
"table" (medium probability)
"banana" (very low probability)

The model assigns a probability to every word it knows - typically 50,000 to 100,000 words. It picks the most likely option (or samples from the top options) and outputs that word. Then it adds that word to the input and predicts again. This cycle repeats until the model generates a complete response.

Next Word Prediction

How It Learns Patterns

How does the model know "Paris" follows "The capital of France is"? It learned this pattern from examples. During training, the model processed billions of sentences from books, websites, and articles. It saw sentences like:

"The capital of France is Paris"
"Paris is the capital of France"
"France's capital city, Paris, is known for..."

After seeing thousands of similar examples, the model learned the statistical relationship between "capital of France" and "Paris." When you type that phrase, the model recognizes the pattern and predicts "Paris" with high confidence.

The model doesn't have a database of facts. It doesn't look up "France" in a table. It learned statistical patterns from text, similar to how you might notice that "once upon a time" usually starts fairy tales after reading hundreds of stories.

Training vs Inference

The process of learning from billions of examples is called training. This happens once, before the model is deployed, and requires massive computational resources - thousands of powerful GPUs running for weeks or months. Companies like OpenAI, Anthropic, and Google handle training.

When you use ChatGPT or Claude, you're using the model in inference mode. The model isn't learning anymore - its patterns are frozen. It's applying the patterns it already learned to predict your next word. Inference is much faster and cheaper than training, which is why you can get responses in seconds.

Think of training as writing a program that takes months to compile, and inference as running that compiled program instantly whenever someone makes a request.

Training vs Inference

What LLMs Can't Do

Before diving deeper, understand what LLMs cannot do:

They don't have real-time information - The model's knowledge freezes at its training cutoff date. It can't access current news, stock prices, or your private data unless you explicitly provide it.

They make mistakes confidently - LLMs hallucinate - they generate plausible-sounding but incorrect information. They don't distinguish between facts they learned from reliable sources and patterns they extracted from unreliable text.

They don't truly understand - The model predicts text based on statistical patterns. It doesn't understand meaning the way humans do. It can't reason about novel situations it hasn't seen similar examples of during training.

They're not deterministic - The same prompt can produce different outputs. Randomness is built into the sampling process that converts probabilities into actual words.

Next, you'll learn about the transformer architecture - the specific blueprint that powers modern LLMs like ChatGPT and Claude. Understanding this architecture reveals how text flows through mathematical operations to produce predictions.