GraphRAG: When Vector Search Isn’t Enough
Vector RAG finds paragraphs that look similar to your question. But some questions are about how things connect — and for those you need a graph. Here’s when plain retrieval breaks, and how GraphRAG fixes it.
Vector RAG works by similarity: embed the question, find the chunks whose embeddings sit closest, stuff them into the prompt. For the huge class of questions that are really just “find the passage that answers this,” it’s the right tool and you should use it. But there’s a kind of question it quietly fails — the kind whose answer isn’t sitting in any single chunk, but spread across many, connected by relationships the embeddings never captured.
Ask “which of our suppliers are affected if this factory goes down? ” and similarity search returns chunks that mention factories and suppliers — not the chain of dependencies that actually answers it. GraphRAG is the response to that gap. It builds a literal map of the entities in your data and how they relate, then lets the model traverse that map instead of guessing from look-alike paragraphs. In 2026 it’s become the standard upgrade for questions that are about connections, not keywords.
Where plain RAG breaks down
Similarity retrieval has a blind spot, and it’s worth naming exactly when you’ll hit it — because most of the time you won’t, and reaching for a graph too early just adds cost.
- Multi-hop questions. “Who reports to the person who approved this budget? ” needs two linked facts; similarity finds each in isolation and never connects them.
- Global questions. “What are the main themes across all these documents? ” has no single chunk to retrieve — the answer is a property of the whole corpus.
- Scattered facts. When the pieces of an answer live in ten different documents, top-k retrieval grabs a few and silently drops the rest.
- Relationship questions. Anything phrased as “how is X connected to Y” is asking about edges, and embeddings only encode nodes.
What GraphRAG adds
GraphRAG doesn’t replace vector search — it layers a knowledge graph on top of it. You extract the entities from your documents (people, companies, products, concepts) and the relationships between them (works-for, depends-on, part-of), and store that as a graph alongside the embeddings. Now retrieval can do two things at once: find the relevant starting points by similarity, then walk the graph to pull in everything connected to them.
That traversal is the whole point. Instead of hoping the right ten chunks happen to be similar to the question, you start from the entities the question mentions and follow the edges — one hop for direct relationships, more for chains. The model gets a connected subgraph instead of a bag of loosely related paragraphs, which is exactly what multi-hop and global questions need.
Building the graph
The graph is usually built with an LLM as the extractor: feed it each chunk, ask it to pull out entities and the relationships between them, and accumulate the results. The shape is simpler than it sounds — nodes and edges you can store in a graph database or even a dictionary to start:
# Build a tiny knowledge graph from text with an LLM extractor.
graph = {} # entity -> list of (relation, other_entity)
def add_edge(a, relation, b):
graph.setdefault(a, []).append((relation, b))
graph.setdefault(b, []).append((f"inverse_{relation}", a))
# An LLM reads each chunk and returns (subject, relation, object) triples.
for chunk in documents:
for subj, rel, obj in extract_triples(chunk): # one LLM call per chunk
add_edge(subj, rel, obj)
# Answer a multi-hop question by walking out from the entities it mentions.
def neighbors(entity, hops=2):
seen, frontier = set(), [entity]
for _ in range(hops):
frontier = [b for e in frontier for (_, b) in graph.get(e, [])]
seen.update(frontier)
return seen
# neighbors("Factory A") -> every supplier, part, and order two hops awayIn production you’d swap the dictionary for a real graph database and add similarity search to find the entry points, but the core never changes: extract entities and relationships once, then traverse them at query time. The expensive part is the one-time extraction; the traversal is cheap.
Vector search answers “what looks like my question? ” A graph answers “what is connected to my question? ” Most failures of RAG are really the first tool being asked the second question.
Where teams get it wrong
- Using it for everything. Most questions are single-hop lookups that plain RAG nails. A graph is overhead you only want when the questions are genuinely about connections.
- Trusting the extraction blindly. The LLM that builds your graph makes mistakes — wrong entities, hallucinated relationships. Garbage edges produce confident wrong traversals; sample and check the graph.
- Letting the graph go stale. Documents change; the graph built from them doesn’t, unless you rebuild. Plan for re-extraction the same way you plan to re-embed.
- Traversing too far. Every hop multiplies the subgraph. Three hops on a dense graph can pull in half your data — cap the depth and prune low-weight edges.
- Skipping the vector layer. GraphRAG works best with both: similarity to find where to start, the graph to expand from there. Dropping either half throws away its strength.
When to reach for it
Default to plain vector RAG. It’s simpler, cheaper, and right for the majority of question-answering. Reach for GraphRAG when you keep seeing the same failure: the answer requires connecting facts that live in different places, and similarity search keeps handing back fragments that each look relevant but don’t add up. That’s the signal that your problem is about relationships, and relationships need a structure that stores them.
The irony of GraphRAG is that it’s old ideas — knowledge graphs predate the LLM boom by decades — meeting new ones. What changed is that an LLM can now build the graph for you automatically, turning a research-grade technique into something a small team can stand up in an afternoon. When your questions are about how things connect, that afternoon is the difference between answers that look right and answers that are.
Enjoyed this?
Get the next deep dive in your inbox. No spam — just the stories worth reading.
Subscribe to the newsletter