All posts
AI & ML

Getting Real Work Out of AI Coding Agents

Most people use Claude Code and Cursor like fancy autocomplete and wonder why the magic runs out. The teams getting real leverage treat them as agents you delegate whole tasks to — and that’s a different skill.

Dhileep Kumar7 min read
Getting Real Work Out of AI Coding Agents

AI coding tools have split into two things wearing similar names. There’s autocomplete — the inline suggestions that finish your line — and there’s the agent, which takes a written task and edits across files, runs commands, and comes back with a diff. Most developers use Claude Code, Cursor, and their cousins as the former and are mildly disappointed. The teams getting real leverage use them as the latter, and it’s a genuinely different skill.

The shift is from typing-assistance to delegation: instead of writing the code and letting the tool finish your sentences, you describe a unit of work and let the agent attempt the whole thing while you review. Done well, it’s like having a fast, tireless junior who needs clear instructions and careful supervision. Done badly, it’s a confident intern committing plausible nonsense. The difference is mostly how you work with it.

Autocomplete vs agent

These are different tools for different moments, and conflating them is why people feel let down. Knowing which mode you’re in changes how you should drive.

  • Autocomplete is for flow. Inline completions keep you in the editor, finishing boilerplate and obvious next lines. You stay the author; it just types faster.
  • An agent is for tasks. You hand it a goal — “add pagination to the orders endpoint and its tests” — and it plans, edits multiple files, and runs the tests. You become the reviewer.
  • The mental shift is delegation. Stop thinking “how do I write this? ” and start thinking “how do I describe this so someone else gets it right?
  • Supervision is non-negotiable. The agent is fast and literal, not wise. Every diff it produces is a proposal you approve, not a fact you accept.

What they’re good and bad at

Agents have a sharp competence profile. Aiming them at what they’re good at — and keeping them off what they’re not — is most of the skill.

  • Great at the well-trodden. Boilerplate, CRUD endpoints, tests for existing code, refactors, and using an unfamiliar library’s standard API — anything with a clear pattern and lots of prior art.
  • Great at translation. Porting code between languages, converting formats, and applying a mechanical change across many files — tedious work they do tirelessly.
  • Weak on novel architecture. Genuinely new design, ambiguous requirements, and decisions with subtle trade-offs are where they confidently pick wrong.
  • Weak with sprawling context. The more files and hidden constraints a task spans, the more likely they lose the thread. Narrow, well-bounded tasks win.

How to actually work with one

The single biggest upgrade is giving the agent a standing context file — a CLAUDE. md, AGENTS. md, or . cursorrules — that states how your project works, so you don’t re-explain it every time. Pair that with small, specific tasks and a hard verification step:

md
# Project rules for the agent (e.g. CLAUDE.md / .cursorrules)

- Stack: Next.js 14 (App Router), TypeScript strict, Tailwind v3.
- Before you call a task done: npm run build must pass.
- Tests: add one for every new function; never delete a failing
  test to make the suite green.
- Style: match the surrounding file. No new dependencies without asking.
- Scope: touch only the files the task names, and end by summarizing
  exactly what you changed and why.

A file like this is the difference between an agent that fits your codebase and one that fights it. Combine it with tasks scoped small enough to review in one sitting — and a rule that the build and tests must pass before anything counts as done — and you get the leverage without the mess.

An AI coding agent is a force multiplier on a clear plan and a magnifier of a vague one. It won’t save a task you couldn’t describe; it will just produce the wrong thing faster.

Where it goes wrong

  • Vague tasks. “Improve the auth code” gives the agent nowhere to aim. “Add rate limiting to the login route, 5 attempts per minute, with a test” gives it everything.
  • Skipping verification. The diff looks right and you merge it. Agents produce confident, plausible, wrong code — run the build and the tests, every time.
  • Giant diffs. A task that rewrites twenty files is impossible to review and easy to slip a bug through. Keep the unit of work small.
  • Trusting it on the unfamiliar. If you can’t evaluate the output yourself, you can’t supervise it. Don’t delegate what you couldn’t review.
  • No context file. Re-explaining your stack and conventions in every prompt is wasted effort and an inconsistent result. Write it down once.

A force multiplier, not a replacement

The developers getting the most from these tools aren’t the ones who trust them most — they’re the ones who supervise them best. They delegate the tedious, well-defined work, keep the architecture and the judgment calls for themselves, and treat every diff as a code review rather than a gift. The agent makes them faster; it doesn’t make them optional.

The skill that’s emerging isn’t writing code, exactly — it’s specifying it: breaking work into reviewable pieces, describing each precisely, and verifying the result. That was always the senior part of the job. AI coding agents just made it the whole job, and rewarded the people who were already good at it.

Share

Enjoyed this?

Get the next deep dive in your inbox. No spam — just the stories worth reading.

Subscribe to the newsletter

Comments