The night shift: how to stop babysitting your AI agents

There’s a resource asymmetry most people ignore when building with AI agents.

Your attention is scarce, expensive, and non-renewable. Every time you pivot to review a diff, babysit a failing test, or course-correct an agent that went sideways, you’re spending the most valuable thing you have. Meanwhile, the agent is sitting idle while you think.

Agent compute is the opposite. It’s cheap, parallelizable, and available at 3am. Your agents don’t need sleep. They don’t get distracted. They can run the same test suite 20 times and not care.

The natural conclusion: stop mixing these two shifts.

Day shift: thinking work only

During working hours, your job is the high-leverage stuff that only you can do.

Gathering requirements and understanding the problem
Writing detailed spec documents with edge cases spelled out
Making architectural decisions
Reviewing final output and writing the next spec

Notice what’s not on that list: writing code, running tests, fixing lint errors, generating boilerplate, updating documentation. Those are agent work.

The key discipline is treating spec-writing as thinking work, not task delegation. You’re not building a to-do list for the agent. You’re crystallizing your own understanding of the problem. The spec forces you to work through the edge cases before the agent has to.

When you write the spec, you’re also scoping the agent’s night. A vague spec produces vague results at 2am that you’ll spend your morning untangling. A precise spec produces a PR you can review in ten minutes.

Night shift: autonomous execution

Once your spec is committed, the agent takes over. A well-structured night shift looks like this:

Repo cleanup and baseline. Run all tests first. If anything is already broken, document it before touching anything else.
Spec analysis. Load the spec, the relevant existing code, and any documentation the spec references.
Testing plan before implementation. Write the tests first. This forces the agent to understand what “done” means before it starts building.
Implementation against tests. Build until the tests pass — not until it seems right, until the tests pass.
Multi-perspective code review. Run the diff through several different lenses: correctness, performance, security, accessibility, maintainability. Each perspective catches different categories of problems.
Full regression. Not just the new tests. Everything.
Documentation and commit. Update what changed, write a detailed commit message, log the work.

The critical piece is step 3: tests before implementation. Without that gate, agents write code that looks right and silently misses edge cases. Tests are the spec translated into something verifiable. The agent can’t hallucinate its way past a failing assertion.

What makes this work

Two things are non-negotiable.

Rigorous automated tests. If your test coverage is thin, the night shift produces unverified output you have to manually review — which defeats the entire point. Build out your test suite first. The night shift is only as reliable as your tests.

A dedicated machine. The night shift needs a machine that doesn’t go to sleep. A Mac Mini running headless with tmux is the right hardware for this. Your laptop isn’t. The agent work and the daytime work shouldn’t compete for the same machine.

Combine these with persistent context — AGENTS.md files that give the agent everything it needs without you re-explaining it — and the loop runs reliably without you in it.

The payoff

Teams running workflows like this report substantially faster velocity, less context-switching, and sustained enjoyment of the work. That last one matters more than people admit. The parts of software development that produce burnout are the tedious, repetitive parts. The night shift handles those.

What’s left for you is the interesting work: understanding the problem, making real decisions, reviewing output that’s already good and making it better.

That’s the job worth doing. Let the agents do the rest.