The agents are here: what autonomous coding means for engineers
A new wave of AI agents can plan, write, and ship code end to end. We dig into what actually works today — and what's still hype.
Six months ago, “AI coding” meant autocomplete. You typed, a model guessed the next few lines, and you took it or left it. Today the frontier looks different: agents that read an issue, explore a codebase, write a patch across a dozen files, run the tests, and open a pull request — sometimes without a human touching the keyboard in between.
That shift is real, and it's worth taking seriously. But the gap between a flashy demo and a tool you'd trust on a Friday deploy is still wide. Here's what autonomous coding actually does well today, where it falls apart, and how the day-to-day of being an engineer changes because of it.
What “agentic” actually means
An agent is just a model wrapped in a loop. Instead of producing one response and stopping, it's given a goal, a set of tools — a shell, a file editor, a test runner, a browser — and permission to use them repeatedly until it decides the goal is met. The model proposes an action, the harness executes it, the result is fed back, and the cycle repeats.
The interesting part isn't the model getting smarter on its own. It's the feedback. A model that can run the test suite and read the failure is operating with far more information than one guessing in the dark — and that single capability explains most of the jump in quality over the last year.
What works today
Used inside its competence, an agent is genuinely useful. The tasks where it shines tend to share a shape: well-scoped, verifiable, and bounded by an existing pattern in the codebase.
- Mechanical refactors — renaming a concept across a repo, migrating an API, splitting a module — where “correct” is checkable.
- Test writing, especially filling in coverage for code that already exists and behaves.
- First drafts of boilerplate: a new route, a CRUD endpoint, a config file that mirrors ten others.
- Bug fixes that come with a reproduction. Hand it a failing test and it will often work backward to a real fix.
- Translating intent into a starting point — turning a paragraph of “here's what I want” into a diff you can react to.
Notice the through-line: in every case there's a cheap way to check the work. The agent isn't trusted because it's smart; it's trusted because the loop it runs in can catch its mistakes before you ever see them.
Where it still breaks
Push past that boundary and the cracks show. Agents struggle with tasks that require holding the whole system in mind, weighing trade-offs no test can express, or making a judgment call about what should be built rather than how.
The model is a confident junior engineer with no memory of yesterday and no stake in tomorrow. It will produce something plausible every single time — which is exactly the problem.
Plausibility is the failure mode to watch. An agent rarely says “I don't know. ” It produces code that looks right, names things sensibly, and fails in ways that survive a quick skim. The cost of review goes up, not down, because the obvious errors are gone and the subtle ones remain.
It also has no taste. It can't tell you the feature is a bad idea, that the abstraction will hurt in six months, or that the simplest fix is to delete the code instead of patching it. Those are the decisions that actually define the job.
A new loop for engineers
The practical answer isn't “use agents” or “don't. ” It's to redesign your own loop around them — to treat the agent as a fast, tireless contributor whose output is cheap to generate and therefore must be expensive to merge. In practice that means leaning hard on the things that make output verifiable.
// Treat the agent like CI: nothing merges until the gates pass.
const gates = [
() => run("npm run typecheck"),
() => run("npm test -- --runInBand"),
() => run("npm run lint"),
] as const;
async function readyForReview(): Promise<boolean> {
for (const gate of gates) {
const { ok, output } = await gate();
if (!ok) {
// Feed the failure back into the loop instead of asking a human.
await agent.fix(output);
return false;
}
}
return true; // now — and only now — a person looks at the diff.
}The mindset shift is simple to state and hard to internalize: the bottleneck moves from writing to verifying. The engineer who thrives spends less time producing lines and more time defining what “correct” means, building the harness that proves it, and reviewing with the assumption that something plausible-but-wrong is hiding in the diff.
- Write the spec and the tests first — they're the contract the agent works against.
- Let the agent draft the implementation and iterate against your gates, not your patience.
- Review the diff as if a stranger wrote it, because effectively one did.
- Keep each change small enough that review is actually possible.
What this means for your job
The fear is that agents replace engineers. The more accurate read is that they replace the part of engineering that was already the least valuable — the rote translation of a clear intent into syntax — and they make everything around that part more important.
Judgment, system design, knowing what not to build, and the ability to tell when something is subtly wrong: those don't get automated by a confident junior in a loop. If anything, they become the whole job. The engineers who win the next few years won't be the ones who resist the tools or the ones who trust them blindly — they'll be the ones who build the verification around them faster than everyone else.
The agents are here. They're not coming for the work that matters. They're coming for the keyboard — and handing you the review queue.
Enjoyed this?
Get the next deep dive in your inbox. No spam — just the stories worth reading.
Subscribe to the newsletter