Context Engineering: Managing What Your LLM Actually Sees
Prompt engineering was about choosing your words. Context engineering is about everything else in the window — what you put in, what you leave out, and in what order. It’s the discipline that separates an LLM demo from an LLM product.
There’s a quiet shift in how good teams build with language models, and it has a clumsy name: context engineering. For two years the craft was “prompt engineering” — finding the magic phrasing that coaxed a better answer out of the model. That still matters at the margins, but it turns out the wording of your instruction is rarely what makes a real application succeed or fail. What matters is everything else in the context window: the documents, the history, the examples, the tool outputs — what you chose to include, what you left out, and the order you put it in.
Call it the difference between writing a good question and assembling a good briefing. The model can only reason over what’s in front of it, and the context window is a small, expensive, surprisingly fragile place. Context engineering is the discipline of curating it deliberately — and in 2026 it’s the skill that most separates an impressive demo from a product that holds up.
Why the context window is a resource
It’s tempting to treat the context window as free space — if the model accepts 200,000 tokens, why not fill it? Because a context window isn’t storage; it’s attention, and attention degrades as you load it. More context is not more intelligence. Past a point, it’s noise that buries the signal you actually needed.
- Context rot. As you pile in more tokens, the model’s grip on any one of them loosens. Quality peaks at some amount of context and then declines — more becomes actively worse.
- Lost in the middle. Models attend most strongly to the start and end of their context and skim the middle. A crucial fact buried halfway down can be effectively invisible.
- Cost and latency scale with tokens. Every token you include is paid for on every call, in money and in milliseconds. A bloated context is a tax you pay forever.
- Distraction. Irrelevant-but-plausible context pulls the model off course. Give it five documents when one was relevant and it may dutifully use the wrong one.
The four moves of context engineering
Almost everything in the discipline reduces to four operations on the window. You’re constantly deciding, for every token competing for a slot, whether it earns its place — and if it does, where it goes.
- Select. Pull in only what this turn needs — the relevant docs, the pertinent history — not everything you have. Retrieval and filtering are how you select; ruthless relevance is the goal.
- Structure. Order and label what you include. Put the most important material where the model looks hardest — the start and end — and use clear delimiters so it knows what’s instruction, what’s data, and what’s example.
- Compress. Summarize old conversation turns, distill long documents, and drop what’s redundant. A faithful summary in 200 tokens often beats the raw 2,000 it replaces.
- Isolate. Give sub-tasks their own clean context instead of one ever-growing thread. A focused window per job beats a single window carrying everyone’s baggage.
Trimming context in code
Here’s the most common move in practice — keeping a conversation within a token budget by windowing and summarizing, instead of letting history grow without bound. The exact policy varies, but the shape is universal.
# context.py - keep a chat within a token budget.
def build_context(system: str, history: list, budget: int, summarize):
# Always keep the system instruction and the most recent turns.
kept = []
used = tokens(system)
for msg in reversed(history): # newest first
cost = tokens(msg.text)
if used + cost > budget:
break
kept.append(msg)
used += cost
kept.reverse()
# Anything that did not fit gets compressed into one summary line.
dropped = history[: len(history) - len(kept)]
if dropped:
kept.insert(0, summarize(dropped)) # a short recap, not raw text
return [system, *kept]Notice what it does: it keeps the system instruction and the freshest turns verbatim, and it compresses everything that doesn’t fit into a single summary rather than silently truncating it. That’s the core instinct of context engineering — nothing important falls off the edge unrepresented, and nothing unimportant takes up a slot it didn’t earn.
The context window is the one part of an LLM app you fully control. The weights are fixed and the training is done — but what the model sees on this call is entirely your decision. Spend it like the scarce resource it is.
Patterns that hold up
- Summarize as you go. When a conversation or task runs long, replace old turns with a running summary. The model keeps the thread without dragging the full transcript.
- Retrieve just in time. Don’t preload everything you might need; fetch the specific fact or document at the moment the task needs it, then let it go.
- Rank before you place. When you have more relevant material than space, score it and include the best — and put the best where attention is strongest.
- Delimit clearly. Mark sections — instructions, context, examples — with unambiguous boundaries so the model never mistakes your data for your commands.
- Isolate with sub-agents. For complex jobs, spin up focused sub-tasks with their own clean windows and return only their conclusions to the main thread.
The mindset shift
The move from prompt engineering to context engineering is really a move in altitude. You stop fiddling with the wording of one message and start designing the information environment the model reasons inside — what enters it, how it’s shaped, when it’s refreshed, and what’s actively defended against. Context rot and distraction aren’t quirks to tolerate; they’re forces to engineer against, the way a systems engineer designs against latency and failure.
It’s less glamorous than discovering a magic prompt, and far more durable. Models will keep changing under you; the discipline of giving them a clean, relevant, well-ordered view of the problem won’t. Treat the context window as the product surface it actually is, and your LLM app stops being at the mercy of the model and starts being something you engineer.
Enjoyed this?
Get the next deep dive in your inbox. No spam — just the stories worth reading.
Subscribe to the newsletter