I switched to Claude Opus 4.6 — here are 5 reasons it's my default

Anthropic dropped Claude Opus 4.6 last week and I’ve been using it as my primary model since day one. Not because the benchmarks told me to — because every session I run produces better output than what I was getting before.

This isn’t a comprehensive review. I’m not going to walk through every benchmark or compare it against GPT on trivia questions. I care about one thing: does this model make me more productive when I’m building software and writing content? The answer is yes, and here’s why.

1. It follows complex instructions without drift

This is the one that matters most. When you hand Opus 4.6 a detailed spec or a long AGENTS.md file, it actually holds the full context and follows it. Not just the first few rules — all of them, consistently, across a long session.

Previous models would start strong and gradually drift. By the tenth turn in a conversation, they’d forgotten constraints from the system prompt. Opus 4.6 doesn’t do this. I’ve run multi-hour coding sessions where it maintained adherence to behavioral rules I set at the start. This is the difference between a model you can supervise casually and one you have to babysit.

For anyone doing behavior-driven prompting, this is transformative. Your living specs actually work as intended across the full session, not just the first few exchanges.

2. The writing is genuinely good

Most LLMs write like they’re trying to sound smart. Filler phrases, hedge words, unnecessary qualifications. “It’s important to note that…” “While there are many considerations…” You know the pattern.

Opus 4.6 writes clean. It’s direct when you tell it to be direct. It matches tone when you give it examples. It doesn’t pad paragraphs to look thorough. This matters when you’re using it for anything beyond code — documentation, blog posts, technical specs, client communications. The output needs less editing, which means fewer cycles through the loop.

I’ve been using it to draft sections of these blog posts. Not wholesale — I write the arguments and structure — but for turning rough notes into clean prose, it’s the best model I’ve used. It doesn’t fight my voice.

3. It reasons through multi-step problems

Ask a model to refactor a function and it’ll usually do fine. Ask it to refactor a function while maintaining backward compatibility with three different callers, updating the tests, and adjusting the documentation — and most models start dropping steps.

Opus 4.6 holds the full chain. It’ll work through the refactor, check each caller, flag the ones that need updates, modify the tests to match, and update the docs. Not because it’s doing some special chain-of-thought trick — because it actually maintains working memory across a complex task.

This is where the spec-driven approach really pays off. When the model can hold complex instructions and reason through multi-step consequences, your specs become genuine project plans rather than aspirational documents the model half-follows.

4. Claude Code turns it into an autonomous agent

The model by itself is a chat interface. Claude Code is what turns it into something genuinely useful for software engineering. It reads your files, runs your tests, edits your code, and iterates — all within your actual development environment.

With Opus 4.6 as the backbone, Claude Code becomes remarkably capable. It doesn’t just generate code snippets for you to paste — it operates inside your project, understands the full codebase context, and makes changes that actually work on the first try more often than not. Pair this with a good AGENTS.md file and the agent has everything it needs to work semi-autonomously.

This is where the loop gets fast. Prompt, the agent implements, tests run, the agent corrects, tests pass. You’re reviewing and steering, not typing. The model’s instruction-following and multi-step reasoning mean fewer corrections per cycle.

5. It knows when to stop

This sounds minor but it’s not. Opus 4.6 answers the question you asked and stops. It doesn’t tack on unsolicited advice, generate code you didn’t request, or “improve” things you didn’t ask it to improve.

This is the pink elephant problem in reverse — instead of the model fixating on things you told it to avoid, it simply stays in scope. You ask for a bug fix, you get a bug fix. Not a bug fix plus a refactor plus new error handling plus updated comments on surrounding code.

For agent workflows, this restraint is critical. An agent that over-generates creates review burden. Every unnecessary change is something you have to evaluate, and evaluation fatigue leads to rubber-stamping, which leads to bugs. A model that does exactly what you asked — no more, no less — is a model you can trust to run longer without supervision.

The trade-offs

Opus 4.6 doesn’t do everything. It can’t generate images. It doesn’t browse the web in real-time. It’s not the cheapest option — if you’re doing high-volume simple tasks, Haiku or Sonnet might be more cost-effective.

And it’s not magic. Bad prompts still produce bad output. No spec still means no direction. The model amplifies your process — if your process is well-structured, the results are exceptional. If your process is “just figure it out,” you’ll get mediocre results from any model.

The bottom line

I’ve used every major model extensively. GPT-4o, Gemini, previous Claude versions. Opus 4.6 is the first model where I stopped switching between providers. Not because the others are bad — because this one is consistently good enough that switching isn’t worth the friction.

If you care about writing quality, instruction following, and deep reasoning for real work — not party tricks, not image generation, not web browsing — this is the model to use. Pair it with Claude Code, write a good spec, set up your AGENTS.md, and watch what happens. And if you want the loop running without tying up your daily driver, a dedicated Mac Mini as an agent server is the natural next step.