Neural Networks — hidden layers, weights, and why they’re powerful (Crash Course AI)
In the Crash Course explanation, a perceptron imitates one neuron. A neural network connects
many perceptrons together, which lets it solve harder tasks (especially image recognition).
Why neural networks became dominant
- Hidden layers let the model build up patterns from simple → complex.
- They work very well for tasks like image recognition.
- They use lots of maths, so they benefit from fast computers.
Big history story (ImageNet → AlexNet)
- ImageNet was a huge labelled photo dataset to help researchers test image-recognition algorithms.
- In 2012, AlexNet used a deeper neural network + faster hardware and improved results a lot.
- This triggered a big surge in neural network research and applications.
Neural network structure (the 3 layers idea)
Input layer → Hidden layer(s) → Output layer.
The hidden layers do most of the “pattern finding”.
Inputs are numbers (features)
- A feature is one piece of input information (a number).
- Numbers can represent many things:
- Sound: amplitudes of a wave
- Text: word frequencies (how often words appear)
- Images: pixel values
Pixels as features (why images are huge)
- Grayscale pixel brightness can be scaled to 0 to 1.
- Colour pixels are often represented using R, G, B numbers (3 values per pixel).
- A 1000×1000 colour image → about 3 million input numbers.
Weights and “squish” (what neurons do)
Each neuron takes many inputs, multiplies them by weights, adds them up, then “squishes” the result
so the output is between 0 and 1.
Weighted sum = (input1 × weight1) + (input2 × weight2) + ...
Then "squish" to 0..1 → neuron output (a number)
Positive vs negative weights (the dog nose example)
If a neuron is looking for a bright curve, it uses positive weights on pixels that should be bright.
If it expects a darker area, it can use negative weights on those pixels.
Hidden layers: simple → complex patterns
Early layers can detect simple patterns (edges/curves). Later layers combine these signals to detect more complex patterns.
The network does not “understand” noses — it detects patterns of light/dark.
Output layer = probabilities
Output neurons represent labels. Their values can be treated like probabilities (e.g. “dog: 0.93” ≈ 93%).
The predicted label is usually the highest probability.
Pros (why people like neural networks)
- Excellent at complex pattern tasks (especially vision).
- Deeper networks can solve trickier problems (“deep learning”).
- Very widely used in real systems today.
Cons / trade-offs (important!)
- Compute cost: more layers → more maths → needs faster hardware.
- Explainability: deeper layers can become hard for humans to interpret (“black box”).
- Fairness risk: still depends on training data, so bias can appear.
Why “explainability” matters (real world)
If a model helps decide important things (e.g. loans, fraud detection, medical screening), people may need to understand why
the model made a decision. In many places, people have rights around this.
Link to your classroom activity (weights made simple)
Your “WeTube” activity was a simplified way to see weights changing influence. Real neural networks also learn weights,
but on a much larger scale.
Reinforcement Learning (RL) — “learning by doing” (Crash Course cookie jar example)
RL is useful when there isn’t one “right” answer to copy. Instead, the system learns by trial-and-error.
If something works, it gets a reward and learns to do more of that in future.
The cookie jar idea (why RL exists)
If you want a cookie on a tall shelf, there are many possible methods (ladder / lasso / pulleys).
You don’t have a teacher showing the “correct move” every step. You learn by trying things and seeing what works.
RL vocabulary you must know
- Agent: the learner (the AI that is trying to get better).
- Environment: the world the agent is in (a game board, a maze, a room, the “kitchen”).
- State: what the agent currently knows/observes (e.g. where it is, what it can see).
- Action: what the agent can do next (move up/down/left/right, jump, pick a move in a game).
- Reward: a small positive/negative signal (“good job” / “bad idea”).
How RL differs from other learning
- Supervised learning: after each example/action, you often have a label telling you the answer.
- Unsupervised learning: no labels; find groups/patterns.
- Reinforcement learning: you might only get feedback at the end (“success” / “fail”).
Why RL is powerful
It can train skills we find hard to explain in rules (e.g. walking). We reward success and let the agent discover how.
Credit assignment (the tricky bit)
If the agent gets a reward only at the end, which earlier actions deserve credit?
RL must work out which actions and states helped reach the reward (and which didn’t).
Value and Policy (decision-making)
- Value: a number describing how “good” a state is (how likely it is to lead to reward).
- Policy: the agent’s strategy for choosing actions (what it tends to do next).
Donut “yummy-ness” idea
High value can also mean high risk. Policies can choose safe or risky actions depending on goals.
Explore vs Exploit
- Exploit: use what you already know works (safe points now).
- Explore: try new actions to find better strategies (may lose points short-term).
Important trade-off
Exploration can look “worse” at first, but can discover shortcuts that win more in the long run.
John-Green-bot grid example (the one you described)
Agent moves around a room (environment). Actions are up/down/left/right. Reaching the charging station gives +1 reward.
After success, we increase the value of states near the goal so the agent is guided next time.
Exploit: use the first known path (safe, but might be long)
Explore: try many paths (some bad) to discover a shorter route
New policy: follow highest-value states -> short path
Negative rewards (penalties)
RL can use penalties to teach avoidance (e.g. a “black hole” tile gives a negative reward).
This changes values and can change the best policy.
Environments can change
Real life isn’t fixed like a simple grid. Roads change, obstacles appear — RL becomes harder.
Value function (slightly more advanced, but you mentioned it)
RL often uses a value function (math) to calculate how good a state is, then chooses actions that maximise expected reward.
Reality check (exam-friendly)
RL can be powerful, but many RL problems need a lot of data and a lot of time.
“Deep reinforcement learning” uses neural networks + big computing to explore huge numbers of states.
One-minute summary (say this out loud)
AI uses models to solve problems. Rule-based systems follow human-written rules.
Data-driven systems learn patterns from data using machine learning.
Classification uses labelled classes and outputs predictions with confidence.
Bad or unrepresentative data can cause bias, so we train, test, and check fairness.
Decision trees are explainable models using feature splits.
Neural networks connect many perceptrons and use hidden layers + weights to produce probabilities (great for images but can be hard to explain).
Reinforcement learning uses an agent that improves through rewards, policies, values, and trial & error (explore vs exploit).