AGENTS.md outperforms skills in agent evals

Vercel just published results from their agent evals that should change how you think about giving AI coding agents the context they need. The short version: a well-structured AGENTS.md file achieved a 100% pass rate across build, lint, and test categories. Skills — the retrieval-based alternative — topped out at 53% by default and never cracked 79% even with explicit instructions to use them.

That’s not a marginal improvement. That’s a category difference.

What are skills?

Skills are an on-demand retrieval mechanism. When an agent encounters a problem, it can invoke a skill — essentially pulling in relevant documentation at the moment it needs it. The idea is sound: keep the context window lean, fetch what you need when you need it.

The problem is that agents often don’t invoke them. In Vercel’s evals, skills weren’t called in 56% of test cases despite being available. The agent simply didn’t decide to use them. And even when you reword the instructions to be more explicit — “always use the Next.js skill before writing code” — you only get to 79%. The wording itself becomes a variable. Different phrasings produce wildly different outcomes.

Skills introduce a decision point, and decision points are where agents fail.

What is AGENTS.md?

AGENTS.md is a file that lives in your project root. Its contents get loaded into the agent’s context on every turn. There’s no retrieval step, no invocation decision. The information is just there.

Vercel compressed their Next.js documentation from 40KB down to 8KB using a pipe-delimited index format — an 80% reduction — and that compressed context was enough to hit perfect scores.

The key instruction baked into the approach: prefer retrieval-led reasoning over pre-training-led reasoning for any Next.js tasks. In other words, trust the docs you’re given over whatever you learned in training.

Why passive context wins

Three things explain the gap:

No activation decision. The agent doesn’t have to decide whether to look something up. It already has the information. Every decision point you remove is a failure mode you eliminate.

Consistent availability. Skills might get invoked on turn 3 but not turn 7. AGENTS.md content is present every single turn. The agent can’t forget to check it.

No sequencing conflicts. With skills, the agent faces a dilemma: should I explore the codebase first, or invoke the skill first? That ordering question creates branching paths, and some of those paths lead to the agent never invoking the skill at all. Passive context sidesteps the problem entirely.

The tradeoff

The obvious concern is context window cost. 8KB of compressed docs on every turn is not free. But the eval results suggest it’s worth it — at least for framework-level knowledge that applies broadly across tasks.

The pattern that emerges is intuitive: general knowledge belongs in passive context, specific knowledge belongs in retrieval. You wouldn’t stuff your entire API reference into AGENTS.md. But the 20% of docs that cover 80% of use cases? That’s exactly what should live there.

What this means in practice

If you’re working with Next.js, you can generate the file now:

npx @next/codemod@canary agents-md

If you’re working with other frameworks, the principle still applies. Figure out what your agent gets wrong repeatedly, compress that knowledge into a concise format, and put it where the agent can’t miss it.

The broader lesson is one we keep relearning in AI tooling: reducing the number of decisions an agent has to make is more valuable than giving it more powerful decision-making tools. A skill is a powerful tool. AGENTS.md is a simpler one. Simpler won.