The frozen mind: how LLM training actually works (and what it can never do)

Going deeper than next-word prediction — what embeddings, attention, and RLHF actually mean, and why understanding the training process makes you dramatically more effective with these tools.

The frozen mind: how LLM training actually works (and what it can never do)

The last time we covered this, we said an LLM is next-word prediction. That’s true. But it’s also like saying a car is “a thing that moves.” Technically accurate, practically incomplete.

If you want to get consistently good results from these systems — and stop being surprised by what they get wrong — you need to understand the mechanism more precisely. Not as a math exercise. As a practical map of the terrain you’re working in.

The more you know the constraints, the better you can work within them.

What training actually means

Absorbing patterns, adjusting weights — that’s the simplified version. Here’s the part most explanations skip: how those weights actually get adjusted.

The process is called gradient descent. Step by step:

  1. Show the model a sentence with the last word removed.
  2. The model predicts what word comes next.
  3. Compare the prediction to the actual word.
  4. Adjust all the weights — billions of tiny numerical dials — slightly toward the correct answer.
  5. Repeat. Billions of times.

No programmer decides “the model should know Paris is in France.” The model figures out that “Paris is in” tends to precede “France” because it sees that pattern across millions of documents. The knowledge isn’t stored as a fact anywhere. It’s stored as a statistical tendency, distributed across billions of weights.

This is important: there is no fact database. When the model tells you Paris is the capital of France, it isn’t looking anything up. It’s generating the sequence that’s statistically most probable given what it learned during training. Usually that’s correct. Sometimes it isn’t — and there’s no separate mechanism to catch the error.

How words become numbers: embeddings

Before the model can predict anything, it has to convert words into numbers. This is done through embeddings — and the way they work is one of the most surprising things about these systems.

Every token gets mapped to a point in a high-dimensional space. Not three dimensions. Thousands.

The geometry of that space encodes meaning. Words that tend to appear in similar contexts end up near each other. “Dog” and “puppy” cluster together. “Bank” (financial) and “bank” (river) land in different neighborhoods. “Paris” and “London” are near each other, as are “France” and “England” — and the relationship between Paris and France mirrors the relationship between London and England, geometrically.

This is how a model can complete “Paris is to France as London is to ___” without storing it as a fact. The answer falls out of the geometry.

Practical implication: LLMs are very good at relationships between concepts because those relationships are baked into how the model represents words spatially. They’re worse at precision — specific numbers, dates, names — because precise recall isn’t what embedding geometry is optimized for. Now you know why.

Attention: why it actually works

A simple next-word predictor would be terrible at language. “He picked up the ball and threw ___” requires knowing that “he” is the one throwing, “ball” is being thrown, and tracking both across the sentence.

Attention is the mechanism that does this. On every prediction step, the model weights how relevant each other word in the context is to the word it’s currently generating. Long-distance dependencies become manageable. The model can figure out that “it” in “the dog ate the food because it was hungry” refers to “dog” — because the attention weights, learned from training, say so.

This is why these systems are called transformers. The architecture is built around this attention mechanism. And it’s why scaling up works — more attention heads, more layers, more parameters means the model can track longer and more complex patterns across more context.

What attention is not: reasoning. The model isn’t thinking through “what does ‘it’ refer to?” It’s applying learned statistical patterns about pronoun reference. When those patterns are reliable, the output looks like understanding. When they fail, you get errors that seem bizarre — because a genuinely thinking agent wouldn’t make them.

Fine-tuning and RLHF: shaping behavior after training

Pre-training gives you a model that’s good at predicting text. But “predict whatever humans wrote” is a terrible default — it means the model would freely continue hateful content, generate dangerous instructions, or produce toxic outputs, because those exist in the training data.

Fine-tuning adjusts the base model on a smaller, curated dataset. Instead of predicting anything, the model learns to produce outputs in a specific style, domain, or format.

RLHF — reinforcement learning from human feedback — goes further. Human raters compare pairs of model outputs and say which is better. A separate model (a “reward model”) learns to predict human preferences. The main model is then trained to maximize those scores.

This is how you get a model that sounds helpful, maintains a consistent personality, and refuses to write malware. It’s also where instruction-following comes from — the model was trained to follow instructions, so it does.

The critical caveat: RLHF optimizes for looking good to human raters, not for being accurate. Humans rate confident, fluent, well-structured answers highly. Sometimes those answers are wrong. The confidence is trained in. It’s not earned. This is why the model sounds certain even when it’s hallucinating — certainty is what gets rewarded.

Why this architecture cannot reason

Here’s the part nobody says plainly: LLMs do not reason.

Not in the way humans mean the word. Let me be specific.

Reasoning requires a model of the world. When you reason through “if I push this cup off the table, it will fall,” you’re simulating a physical system. You understand causality: the push causes the fall. LLMs learn correlations. “Push” and “fall” appear together in text. The model generates text where they appear together. It’s not simulating causality — it’s reproducing the pattern of correlation.

This matters most at the edges. When you’re asking the model to reason about a novel situation — one where the right answer requires genuine causal inference, not pattern matching — the model has no reliable mechanism. It will produce something that looks like reasoning, following the structure and vocabulary of logical argument, because it has seen millions of examples of logical argument in training. But the output is pattern-matched structure, not derived logic.

Chain-of-thought prompting — asking the model to “think step by step” — works better than asking for a direct answer. Not because it makes the model reason. Because it makes the model generate intermediate tokens that look like reasoning steps, and those tokens become context that guides the final prediction. It’s a useful trick. It improves outputs. It’s not deliberation.

Extended thinking modes in modern models generate longer token sequences before responding. Same principle, more tokens. Still not a causal simulation. Still pattern matching, with more runway.

Knowing this changes how you use these tools. You stop trusting chain-of-thought as a guarantee of correctness and start treating it as a heuristic that improves probability. You verify. You push back. You supply the causal reasoning the model can’t actually do.

Why AGI is a completely different problem

People worry that scaling these systems will produce artificial general intelligence. It won’t — not by scaling up what they currently are.

AGI, in any meaningful sense, requires things that are structurally absent from this architecture:

Goals and agency. An LLM has no objectives. It wants nothing. Every response is a reaction to a prompt — not an action taken in pursuit of anything. When the conversation ends, nothing continues. There is no “it” between conversations.

Continuous learning. The model’s weights are frozen at training. Everything you tell it in a conversation is gone when the context clears. It cannot update its understanding based on experience, cannot form new knowledge, cannot change. This is the opposite of how any intelligent system develops.

Grounding in reality. The model knows the word “red” from context — which words surround it, how it’s described, where it appears. But it has no percept of red. No experience of seeing the color. The symbols connect to other symbols, never to sensory reality. This is the symbol grounding problem, and it’s unsolved. More parameters don’t solve it — they just create a more elaborate floating symbol network.

Causal understanding. An LLM trained on physics papers answers physics questions well. It cannot discover new physics, because discovery requires running the world forward in a causal model — actually testing what happens when variables change — not retrieving the most probable sequence about physics.

The scenario people fear — a system that decides to pursue goals adversarially, improves itself, escapes control — requires things LLMs structurally lack. No goals to pursue. No mechanism for self-modification. No initiative. They react. They don’t act.

This isn’t reassuring because these systems are harmless. It’s clarifying because the actual risks are different from the science fiction ones: overconfident output, brittle reasoning at edge cases, RLHF-trained confidence that doesn’t track truth, and humans who trust the fluency more than they should.

The practical payoff

None of this is criticism. A system that’s extraordinarily good at pattern matching — synthesizing information, following complex instructions, generating useful text, reproducing structure across domains — is genuinely powerful. The key is knowing what kind of powerful.

Once you do, everything follows:

  • Give it patterns to match. Examples, reference material, specs, context. The model performs better when it’s matching against something in the prompt than generating from training weights alone.

  • Know where correlation breaks. Don’t trust the model for novel causal reasoning. Don’t ask it to infer something it can’t have learned from text. Verify outputs that depend on precision — numbers, citations, specific facts.

  • Read the confidence skeptically. Fluent and certain is the trained default. It doesn’t signal accuracy. Every factual claim is a hypothesis until you’ve checked it.

  • Work with the context window. The model isn’t remembering or planning. It’s working from what’s currently loaded. Specs and AGENTS.md files are the closest thing to persistent memory you’re going to get — use them deliberately.

  • Stop fighting the limits. This architecture is what it is. It won’t reason its way to truth, and it won’t spontaneously develop goals. Work with the pattern machine you have. It’s more useful than most people realize — as long as you don’t ask it to be something it structurally cannot be.

One more ELI5

Imagine a student who has read every book ever written but has never experienced the world. Ask about physics — they explain it fluently. Ask them to design a new experiment — they propose one in perfect experimental vocabulary. But they’ve never touched a lab bench. Never seen a result fail. Never had to figure out why.

That student is enormously useful. They know patterns, vocabulary, structure, and relationships across every domain of human knowledge. They can help you think through problems, draft documents, write code, explain complex ideas, and translate concepts across fields.

What they cannot do is think — not in the way the word matters when you’re actually working something out.

LLMs are that student. Extraordinarily well-read. Deeply pattern-aware. Frozen in place the moment training ended. And not reasoning — not the way you are when you’re genuinely figuring something out.

The more you understand that, the better you’ll be at using them.

If you want to go further:

  • Build a Large Language Model (From Scratch) by Sebastian Raschka — builds an actual LLM step by step, so you understand attention, tokenization, and training loops by doing them. The fastest way to demystify the mechanics.
  • The Alignment Problem by Brian Christian — covers RLHF, reward modeling, and why training AI systems to behave correctly is harder than it looks. Essential context for understanding why confident output doesn’t mean accurate output.