Structured Outputs: Getting Reliable JSON and Tool Calls from LLMs
Ask a model for JSON and it’ll often hand you prose, a markdown fence, and a trailing apology. For anything automated, “often valid” is a bug. Here’s how to make an LLM return data your code can actually trust.
Ask a language model to “return the result as JSON” and you’ll get JSON — usually. Sometimes you’ll get JSON wrapped in a markdown code fence. Sometimes a cheerful “Sure! Here’s your JSON:” in front of it. Sometimes a trailing comma that makes it fail to parse, or a field you never asked for, or — on the call that happens to run in production at 2 a. m. — prose with no JSON at all.
For a chatbot, “usually” is fine; a human reads it and moves on. For an automated workflow — where the model’s output feeds the next function — “usually valid” is just a bug with a delay on it. If your code parses the model’s reply, you need that reply to be valid structured data 100% of the time, not 95%. Getting from 95% to 100% is what structured outputs are for, and it’s less about prompting tricks than about using the right mechanism.
Why asking for JSON isn’t enough
The failure isn’t that the model can’t produce JSON; it’s that free-form text generation has no guarantee of structure. You’re asking a system that predicts the next likely token to produce output that’s valid under a formal grammar, and “likely” and “valid” aren’t the same constraint. Left to prompt instructions alone, the model will mostly comply and occasionally improvise.
- Conversational wrapping. The model frames the JSON with a greeting or an explanation, so your parser chokes on the prose around it.
- Markdown fences. It helpfully wraps the JSON in triple backticks — great for humans, fatal for a naive parse.
- Hallucinated or missing fields. It adds a field that isn’t in your schema, or omits a required one, and your downstream code breaks on the shape, not the syntax.
- Invalid JSON. A trailing comma, an unescaped quote, a truncated response — small syntactic slips that turn the whole payload into an exception.
Three ways to get structure
There’s a ladder of reliability here, and most teams start too low on it. The higher you climb, the less you’re hoping for structure and the more you’re guaranteeing it.
- JSON mode. Most providers offer a flag that constrains the model to emit syntactically valid JSON. It guarantees the output parses — but not that it matches your shape. A necessary floor, not the whole solution.
- Schema-constrained outputs. Supply an explicit schema — often from a Pydantic or Zod model — and the API constrains generation to match it: right fields, right types, every time. This is the tool you actually want for data extraction.
- Function and tool calling. Describe a function with typed parameters and the model returns a validated call to it rather than free text. Same machinery as schema outputs, framed as “which action, with what arguments” — and it’s how agents act.
Structured output in code
Here’s the pattern with a schema. You define the shape you want as a typed model, hand it to the API, and get back an object that already matches — no parsing, no defensive cleanup, no praying.
# extract.py - schema-constrained structured output.
from pydantic import BaseModel
class Ticket(BaseModel):
category: str # one of: billing, bug, feature
urgency: int # 1 (low) to 5 (urgent)
summary: str
# The API constrains generation to the schema and returns a typed object.
ticket = client.parse(
model="your-model",
schema=Ticket,
input="My payment failed three times and I am furious.",
)
# No parsing, no cleanup - ticket is already a validated Ticket.
print(ticket.category, ticket.urgency) # e.g. billing 5The win isn’t shorter code; it’s the removed class of bugs. There’s no string to clean, no fence to strip, no try/except around a parse that might explode. The model’s output arrives as a typed object that’s already valid, so the failure modes from the last section simply can’t occur — the structure is enforced at generation time, not hoped for afterward.
The reliable way to get data out of a language model is to stop asking nicely and start constraining the output. Prompts request a format; schemas enforce one. In automation, only enforcement counts.
Getting it right in production
- Define a real schema. Specify types, enums, and which fields are required. The tighter the schema, the less room the model has to surprise you.
- Validate anyway, and retry. Even constrained output can occasionally miss; validate against your schema and, on failure, retry once with the error fed back. Belt and suspenders.
- Keep schemas small. A schema with forty fields is harder to fill correctly than three focused ones. Decompose big extractions into smaller, reliable calls.
- Handle refusals. A constrained model still needs a way to say “I don’t know” — include a null or an explicit empty case so it isn’t forced to invent a value just to satisfy the shape.
- Version your schemas. The shape your code expects will change; treat the schema as an API contract, version it, and migrate deliberately.
The payoff
Structured outputs are the unglamorous bridge between a language model and the rest of your software. A model that returns reliable JSON or a validated function call stops being a clever text generator you have to babysit and becomes a component you can compose — its output flows straight into a database, an API, or the next step of an agent, with no human in the loop to catch the mess.
That’s the whole promise of building with LLMs as infrastructure rather than as a chat toy: outputs you can trust enough to act on automatically. You get there not by writing a more persuasive prompt, but by refusing to accept anything but the shape you asked for. Constrain the output, validate it, and let your code do what it does best — run reliably on data it understands.
Enjoyed this?
Get the next deep dive in your inbox. No spam — just the stories worth reading.
Subscribe to the newsletter