Functions with Learnable Parameters
What is a Neural Network?
The term "neural network" sounds complex and biological. In reality, a neural network is just a function—a mathematical operation that takes inputs and produces outputs. The special part? The function has parameters (numbers) that can be adjusted to change its behavior.
Think of a simple function like f(x) = 2x + 5. The input x could be anything, but the numbers 2 and 5 are fixed. A neural network uses the same structure, except those numbers aren't fixed—they're learnable. The model adjusts them during training to make better predictions.
A Simple Example: Predicting House Prices
Suppose you want to predict house prices based on square footage. A simple predictor might look like this:
price = size × weight + bias
Here, size is your input (square feet). The weight and bias are parameters. If weight = 150 and bias = 50000, then a 1,200 sq ft house would be predicted at $230,000.
This is a neural network with one "neuron"—just multiplication and addition. No magic, no biology.
Weights and Biases
The two types of parameters in neural networks have specific names and roles.
Weights scale the input. In our house price example, the weight (150) determines how much each square foot contributes to the price. Larger weights mean the input has a bigger effect on the output.
Biases shift the output up or down. The bias (50,000) represents the base price before considering size. Even a zero-square-foot house (nonsensical, but mathematically) would have this base value.
Together, weight × input + bias creates a linear function. This is the building block of all neural networks. Complex models like GPT just use millions of these simple operations stacked together.
Training = Finding Good Parameters
How do we find good values for weight and bias? Training. The model starts with random values. It makes predictions on known data (houses where we know the actual price). When predictions are wrong, the model adjusts the parameters slightly to reduce the error. Repeat millions of times, and the parameters converge to good values.
This process is called gradient descent, but the core concept is simple: adjust parameters to minimize mistakes.
Why "Neural" is Misleading
The term "neural network" comes from early attempts to model how neurons in the brain work. Biological neurons receive signals, process them, and fire outputs. Early AI researchers thought input × weight + bias resembled this behavior.
However, modern neural networks have little to do with biology. We don't model dendrites, axons, or neurotransmitters. The mathematical operations are completely different from how real neurons function. The brain's billions of neurons operate in parallel with chemical and electrical signals. Neural networks in software are just functions with parameters, running on GPUs.
The name stuck, but "parameterized function" would be more accurate.
More Than One Input
Real predictors rarely use just one input. For house prices, you might consider size, number of bedrooms, age, and location. Each input gets its own weight.
price = (size × w₁) + (bedrooms × w₂) + (age × w₃) + (location × w₄) + bias
Now we have five parameters: four weights and one bias. The model learns the importance of each feature. If w₁ = 150 and w₂ = 20000, size matters more than bedrooms for this dataset.
The Core Building Block
The examples above demonstrate the fundamental building block of neural networks: linear transformations using weights × inputs + bias. This operation appears billions of times in models like GPT-4. However, it's not the complete picture.
Small predictor (our example): 5 parameters (4 weights + 1 bias)
GPT-3: 175 billion parameters
The scale difference is enormous. But there's a more important difference: large models don't just stack linear transformations. They intersperse them with non-linear operations (activation functions) that dramatically increase their power. Without non-linearity, stacking multiple layers would be pointless—linear operations combined remain linear.
We'll build toward that complete picture. For now, focus on the linear transformation. It's the foundation. Module 4 adds the non-linear components that make deep networks work.
What We've Learned
Neural networks are functions with learnable parameters. The parameters (weights and biases) determine the function's behavior. Training adjusts these parameters to minimize prediction errors.
There's no magic. The "neural" terminology is historical and misleading. Modern neural networks are mathematical functions that process numbers.
The next article explores the core operation that makes neural networks efficient: matrix multiplication. This operation allows processing many inputs simultaneously, which is critical for transformers processing entire sentences at once.