AI & ML

The agents are here — so I built one in 300 lines to see how thin it really is

I built mini-cursor, a Cursor-style AI panel for VS Code, in one source file and ~300 lines. The streaming plumbing was trivial; the 12K-char context cap and the line-number problem showed me where the real work — RAG and safe multi-file edits — actually begins.

Dhileep KumarJun 6, 20267 min read

The agents are here — so I built one in 300 lines to see how thin it really is

Six months ago, "AI coding" mostly meant autocomplete: you typed, a model guessed the next few lines, and you took it or left it. Today the pitch is bigger. Agents read an issue, explore a codebase, edit a dozen files, run the tests, and open a pull request. The demos are genuinely impressive, and it is easy to walk away believing there is some deep new machinery behind the curtain.

I wanted to know how thick that curtain actually is. So instead of writing another think-piece about autonomous coding, I built the smallest honest version of the thing everyone is excited about: a Cursor-style AI panel that lives in my editor, reads what I am looking at, and streams a model's answer back into a chat bubble. I called it mini-cursor. It is one real source file and about 300 lines, and building it taught me more about where the hype ends than any benchmark could.

What "agentic" actually means — and what mine is not

An agent, stripped of marketing, is a model wrapped in a loop: give it a goal, give it tools (a shell, a file editor, a test runner), and let it act repeatedly until it decides it is done. The model proposes an action, the harness runs it, the result is fed back, and the cycle repeats. The magic is not the model getting smarter mid-task; it is the feedback. A model that can run your tests and read the failure is working with far more information than one guessing in the dark.

I want to be precise about what I built, because the honest version is the interesting one. mini-cursor is explicitly a v1 skeleton. It does chat plus context plus streaming. It does not yet do the two things that make an agent an agent: it has no retrieval over the whole repository, and it cannot make multi-file edits on its own. Those are written down in the roadmap, not in the code. So the rest of this post is really about the layer underneath the agent hype — the plumbing that is genuinely easy, and the two hard problems it makes impossible to ignore.

The part that is shockingly thin

Here is the uncomfortable truth I hit almost immediately: getting a competent AI chat panel into your editor is not hard anymore. The whole extension is 168 lines of TypeScript, and the entire app across the webview markup and CSS is roughly 307 lines. It compiles to a single esbuild bundle of about 162 KB, most of which is the Anthropic SDK itself, not my code. There is no framework in the webview at all — it is vanilla JS styled entirely with VS Code's own CSS variables, so it matches whatever theme you already use.

The streaming that makes these tools feel alive is a handful of lines. The SDK exposes a streaming call, and you subscribe to text deltas as they arrive and forward each one to the UI. I did not run a stopwatch on it, so I will not quote latency numbers, but the wiring is genuinely this small:

typescript

const stream = client.messages.stream({
  model,                 // default: claude-sonnet-4-6
  max_tokens: maxTokens, // default: 4096
  system: "You are Mini Cursor, a coding assistant embedded in VS Code. Be concise.",
  messages: this.history, // full prior turns, kept in the extension host
});

stream.on("text", (delta) => {
  full += delta;
  this.post({ type: "delta", text: delta }); // append to the current bubble
});

await stream.finalMessage();
this.history.push({ role: "assistant", content: full });

That is the token-by-token effect you see in every AI editor, reduced to its skeleton. One detail worth calling out: the conversation history lives in the extension host, not in the webview. VS Code can tear a webview down when you hide it, so if I kept the transcript there it would vanish. Storing it host-side (plus setting retainContextWhenHidden on the panel) is what lets a session survive being hidden and re-shown. It is a small architectural choice that has nothing to do with AI and everything to do with the platform.

Security was similarly a matter of using the platform correctly rather than inventing anything. The API key never touches settings or source; it goes into VS Code SecretStorage via a password input box. And because the extension host runs in Node, the Anthropic SDK works directly, with no browser CORS shim to fight. None of this is clever. That is the point.

Wiring a streaming, context-aware AI panel into an editor is now a weekend of plumbing. Everything that makes it actually good lives in the two problems the plumbing conveniently ignores.

Hard problem one: what do you even send the model?

The moment I moved past "echo the user's question" I hit the real question at the center of every AI coding tool: context. The model cannot see your codebase. It only knows what you put in the prompt. So on every turn, mini-cursor gathers editor context and prepends it to the question. The logic is deliberately blunt, and the bluntness is the lesson:

typescript

const selectedText = sel.isEmpty ? "" : doc.getText(sel);
const parts = ["Active file: " + rel + " (" + doc.languageId + ")"];

if (selectedText) {
  parts.push("Selected lines " + (sel.start.line + 1) + "-" + (sel.end.line + 1) + ":");
} else {
  // No selection -> send the whole (small) file, capped to keep tokens sane.
  const full = doc.getText();
  const capped = full.length > 12000
    ? full.slice(0, 12000) + "\n...(truncated)"
    : full;
  parts.push("File contents:\n" + capped);
}

If you have text selected, it sends just those lines. If nothing is selected, it sends the whole file — but caps it at 12,000 characters and appends a truncated marker. I chose that cap for one unglamorous reason: to keep token cost and latency bounded. Naively shipping a large file to the model on every turn is how you light money on fire and slow the loop to a crawl.

And that 12K cap is exactly where the illusion breaks. A single-file, character-capped view is fine for "explain this function" or "write a test for what I have selected. " It is useless for "why does auth fail when the billing service is down," because the answer lives in three files this heuristic will never include. The cap is a placeholder for the thing that actually makes tools like Cursor good: retrieval over the whole repository. In my roadmap I name RAG as the single biggest quality lever — and building the naive version is what convinced me that is true.

Hard problem two: editing without lying about line numbers

The other half of an agent is that it should change your code, not just talk about it. mini-cursor does not do this yet, on purpose, but designing for it forced a decision I did not expect to have an opinion about: how do you represent an edit? The tempting answer is line numbers — "replace lines 40 to 52. " It is also a trap. Line numbers drift the instant the file changes, so an edit computed against one version silently corrupts a slightly different one. That is a well-known failure mode of naive LLM code-editing.

So the roadmap pre-commits to a specific alternative, and I think it is the right instinct even before I have implemented it:

Represent every edit as a search/replace block — the exact text to find and the exact text to put in its place — never a line range.
Require an explicit approval step before anything is written to disk, so a plausible-but-wrong edit is caught by a human, not by a broken working tree.
Keep each change small enough that reviewing it is actually possible, because cheap-to-generate output has to be expensive to merge.

Notice that these are not model-quality problems. A smarter model does not save you from line-number drift; a better harness does. The hard part of "AI that edits your code" turns out to be the boring engineering around the model, not the model itself.

Why I did not fork VS Code

One decision I am glad I made early: mini-cursor is a plain extension, not a fork of the editor. Cursor forked VS Code because it wanted control over the editor chrome itself. That is a real reason — but a reason for a later stage. For a v1 that is chat plus context plus streaming, an extension has everything you need, and tools like Cline and Continue prove you can go a long way without forking. My rule of thumb, written into the README, is simple: fork only when the UI itself is the bottleneck. Until then a fork is a maintenance tax you pay for nothing.

That decision is a microcosm of the whole exercise. The instinct with AI tooling is to reach for the biggest, most impressive-sounding architecture. Almost every time, the smaller thing is enough to learn what actually matters — and what actually matters is rarely the part the demos show off.

What building the thin version taught me

The agents are here, and the surface layer really is that thin: a streaming SDK call, a webview, a place to stash the API key, and a few hundred lines of glue. If your fear is that this plumbing is some deep moat, building it yourself is the fastest cure. You can have a working, context-aware AI panel in your editor in an afternoon.

But the two problems I ran straight into — how to select the right context out of a whole repository, and how to edit code without lying about where it lives — are exactly the ones no streaming trick solves. They are retrieval and verification, engineering problems dressed up as AI problems. That is where the real work is, and it is also, not coincidentally, where the judgment of the person building the tool still matters. The agents came for the keyboard. They did not come for the two hard problems — and after 300 lines, I am fairly sure those problems are the whole job.

Enjoyed this?

Get the next deep dive in your inbox. No spam — just the stories worth reading.

Subscribe to the newsletter