AI & ML

GraphRAG in Practice: The 'Nodes vs Edges' Test for When Your RAG Needs a Graph

Most GraphRAG advice tells you what a knowledge graph is. This is the decision I actually make before building one: is the question about a thing, or about a path between things? Here's the test, a supply-chain example traced hop by hop, and the four production gotchas that eat the afternoon nobody budgets for.

Dhileep KumarJun 12, 20267 min read

There is a version of this article on every AI blog: vector search finds similar chunks, graphs find connected ones, here is a diagram of nodes and edges. It is all true and none of it helps you decide anything. So I want to skip the definitions and give you the one question I actually ask before I let anyone bolt a graph database onto a retrieval stack. The question is this: is the user asking about a thing, or about a path between things? That single distinction predicts whether GraphRAG will earn its keep or just add a moving part you will forget to maintain. Everything below is built around making that call quickly and correctly.

The nodes-vs-edges test

Embeddings encode nodes. A vector is a compressed description of one thing: a paragraph, a product, a person. Similarity search is therefore excellent at 'find me the node most like this description. ' What a vector fundamentally cannot store is a relationship, because a relationship is not a property of one thing, it is a fact that lives between two things. That is an edge, and edges are exactly what a graph is for.

So before reaching for GraphRAG, rewrite the user's question in your head as a sentence and look at the verb. If the verb describes an attribute ('what does this contract say about termination? ') you are asking about a node, and plain RAG is the right and cheaper tool. If the verb describes a connection ('which contracts inherit this termination clause through a parent agreement? ') you are asking about edges, and no amount of better embeddings will save you.

Run that test honestly and you will find most of your questions are node questions. That is the point. GraphRAG is not an upgrade you apply everywhere; it is a targeted fix for the minority of questions whose answer is a path.

A worked example: the factory-outage question

Take the classic case: 'Which of our customers are at risk if the Chennai plant goes offline? ' Say your corpus is a few hundred internal docs - supplier contracts, a bill-of-materials spreadsheet exported to prose, customer order histories. Watch what each approach does with it.

Plain vector RAG embeds the question and pulls the top chunks. What comes back? The paragraph describing the Chennai plant, a couple of supplier blurbs that mention 'risk,' maybe a customer FAQ about delays. Each chunk is individually relevant and the set is collectively useless, because the answer - Chennai makes component X, component X goes into product Y, product Y ships to customers A, B and C - is a four-hop chain and no single chunk contains it. The model, handed fragments, will confidently name whichever customer happened to appear in the retrieved text. That is the failure mode: not a wrong-looking answer, a plausible-looking wrong answer.

Now trace it as a graph. Start at the node the question names, then follow edges:

Chennai plant - produces -> Component X (hop 1)
Component X - part-of -> Product Y (hop 2)
Product Y - ordered-by -> Customers A, B, C (hop 3)

Three hops and you have the actual answer, with the reasoning path attached so the model can explain it. The graph did not need any chunk to contain the whole story; it reconstructed the story by walking edges that were each extracted from a different document. That reconstruction is the entire value proposition - everything else is plumbing.

Vector search answers 'what looks like my question? ' A graph answers 'what is reachable from my question? ' Most RAG failures are the first tool being handed the second kind of question.

The traversal, and the bug hiding inside it

The graph itself is unglamorous: nodes and typed edges you build once with an LLM extractor, then walk at query time. The tutorials stop at a two-line lookup and imply traversal is trivial. It is not - the naive walk has a failure baked in that only shows up on real, dense graphs. Here is a bounded breadth-first traversal that makes the problem visible:

python

from collections import deque

# edges: {source: [(relation, target, weight), ...]}
def traverse(graph, start, max_hops=3, min_weight=0.3, max_nodes=50):
    seen = {start}
    frontier = deque([(start, 0)])
    path_facts = []
    while frontier:
        node, hop = frontier.popleft()
        if hop >= max_hops:
            continue
        for relation, target, weight in graph.get(node, []):
            if weight < min_weight:        # prune noisy edges
                continue
            path_facts.append((node, relation, target))
            if target not in seen and len(seen) < max_nodes:
                seen.add(target)           # cap the blast radius
                frontier.append((target, hop + 1))
    return path_facts

# Without max_nodes and min_weight this quietly returns half your corpus.

The dangerous lines are the ones missing from most examples: the caps. Every hop multiplies the frontier by the average out-degree of your nodes. On a graph where popular entities have dozens of edges - a common vendor, a shared parent company, the token 'AWS' - three unbounded hops does not return a tidy subgraph, it returns a large fraction of everything. You then stuff that into a context window, pay for the tokens, and drown the model in the very noise you built the graph to avoid. So the two parameters that look like tuning knobs, max_nodes and min_weight, are not optional polish - they are the difference between a subgraph and a data dump. Set them before you have a problem, because the day you notice is the day a hub node blows up your latency and your bill at once.

When to reach for it (and when not to)

Treat the list below as a rule of thumb, not a benchmark - it maps question shapes to the right tool, and the trade-off underneath it is fixed: graphs pay a large one-time extraction cost and add per-query traversal complexity, in exchange for answers plain RAG structurally cannot produce.

Single-fact lookup ('what's the notice period? ') -> plain vector RAG. A graph buys you nothing and costs you an extraction pipeline.
Multi-hop ('who approved the budget that funded this hire? ') -> GraphRAG. Two linked facts vector search finds separately and never joins.
Global / thematic ('what themes run across all incident reports? ') -> GraphRAG or a summarization pass. No single chunk is the answer; it's a property of the whole set.
Scattered-fact aggregation ('list every subsidiary that touches this vendor') -> GraphRAG. Top-k silently drops the tail; traversal doesn't.
High-churn corpus that changes hourly -> lean vector RAG unless the questions truly need edges; graphs are expensive to keep fresh (see below).

Four gotchas the docs won't tell you

The extraction demo always works. Production is where the interesting failures live, and they are not the ones the getting-started guide prepares you for.

Entity resolution is the real job. 'Acme Corp,' 'Acme,' and 'ACME Corporation' are one node to a human and three nodes to a naive extractor - which means your graph silently splits into disconnected islands and traversals stop short of the answer. Nobody warns you that most of the engineering in a real GraphRAG build is deduplicating and canonicalizing entities, not the graph algorithms. Budget for it.

Edge weights are noise until you make them signal. The LLM will happily emit a 'related-to' edge between two entities that co-occurred in one sentence by coincidence. Give every edge a weight and a provenance (which chunk produced it), then let traversal prune on weight. An unweighted graph treats a load-bearing dependency and a throwaway mention as equals, and your walks wander.

Staleness is worse for graphs than for embeddings. Re-embedding a changed document is a local, cheap operation - one chunk, one vector. Re-extracting is global: a new document can create edges to entities extracted months ago, so you cannot just append, you have to reconcile. Teams plan a re-embedding cadence and forget that a graph needs a re-extraction cadence - and a stale edge doesn't degrade gracefully, it confidently routes you to a supplier you dropped last quarter.

Don't drop the vector layer. The best setup is hybrid: use similarity to find the entry-point nodes (the user rarely names entities exactly as your graph stores them), then traverse from there. Pure graph retrieval fails at the front door because it can't fuzzy-match the question to a starting node; pure vector retrieval fails at the connections. GraphRAG is both halves, and teams that pick a side get the weaknesses of one without the strength of the other.

GraphRAG is old technology - knowledge graphs predate the transformer by decades - that became newly practical because an LLM can now build the graph for you in an afternoon instead of a research quarter. That accessibility is real and worth using. But 'you can build it in an afternoon' is not the same as 'you should,' and the afternoon that builds the graph is not the afternoon that keeps it correct. So run the test. Write the question as a sentence, look at the verb, and ask whether you are querying a thing or a path. If it's a thing, stay with vector RAG and enjoy being cheap and simple. If it's a path - and you can feel it, because similarity keeps handing you fragments that each look right and refuse to add up - then you have a genuine edge problem, and edges need a structure that stores them. That is the whole decision, and now you can make it in ten seconds instead of a sprint.

Enjoyed this?

Get the next deep dive in your inbox. No spam — just the stories worth reading.

Subscribe to the newsletter