RAG, GraphRAG, and Knowledge Graphs: What's Actually Different

Pratik Kulkarni
AI , Cloud Architecture
26 May, 2026
13 Mins read

LLMs are stateless. They don’t know your documents, your internal data, or what changed last week. They’re only as good as what you put in front of them. This gave rise to what’s now called context engineering — the practice of figuring out what information to retrieve, when, and how to inject it into an LLM’s context window so it can do useful work.

The first answer was simple and shipped fast:

store your documents as vector embeddings,
retrieve the most semantically similar chunks at query time,
and pass them to the model.

This is RAG — Retrieval Augmented Generation. It generalized well and became the default pattern for document Q&A. But as teams applied it to more complex use cases, specific limitations emerged that flat text retrieval could not address.

RAG treats everything as flat text. It can retrieve a paragraph, but it can’t traverse a relationship.

If a CV says “led the cloud migration” and the query is “AWS experience,” the embedding similarity may not be strong enough to fetch that document.
Multi-hop questions — find engineers who have X, worked alongside someone with Y, and are currently available — have no single passage to retrieve. The answer doesn’t exist in any document.

Two techniques emerged to address these limitations, each targeting a different root cause.

GraphRAG addresses retrieval quality. If documents are represented as a graph of entities and relationships, retrieval can follow those relationships at query time rather than relying purely on vector similarity. Microsoft released an open-source implementation in 2024. It requires no schema design — the graph is extracted automatically from unstructured documents — and produces better results on questions that span multiple entities or require relational context.

Knowledge graphs address a different problem: even with improved retrieval, the answer is still synthesised by an LLM, which makes it probabilistic by nature. For use cases where answers need to drive system actions or be fully auditable, that probabilistic step is a liability. A knowledge graph with an explicit ontology supports deterministic traversal — the query either returns the data or it doesn’t, with no synthesis step involved. The tradeoff is that the ontology must be designed upfront, which requires domain expertise and engineering investment.

This article covers how RAG, GraphRAG, and knowledge graphs each work, where they fit, and what tradeoffs they carry.

RAG — Retrieval Augmented Generation

The default starting point for most document Q&A use cases — fast to build, no schema required, and effective when the answer exists within a single passage.

How RAG Works

RAG has two distinct phases: an offline ingestion phase where your documents are processed and stored, and an online retrieval phase that runs at query time.

Ingestion: Chunking, Embedding, and Storing

Chunking is the first step, and one of the most consequential. LLMs have finite context windows, and vector search works better over focused passages than entire documents — so documents are split into smaller units called chunks before being embedded. How you split matters significantly.

The simplest approach is fixed-size chunking: split every N characters or tokens regardless of content. It’s predictable but frequently cuts sentences and ideas in half. A more robust approach is recursive or sentence-aware chunking, which respects natural boundaries like paragraphs and sentences. More advanced still is semantic chunking, which groups text by topic similarity rather than character count.

Chunk size is one of the most impactful hyperparameters in a RAG system.

Smaller chunks (128–256 tokens) produce more precise retrieval but lose surrounding context.
Larger chunks (512–1024 tokens) preserve more context per chunk but make retrieval noisier — the relevant sentence is in there somewhere, but so is a lot of unrelated text.

Most implementations also apply a chunk overlap of 10–20%, where adjacent chunks share some content, to avoid losing information that falls at a boundary.

Embedding converts each chunk into a dense vector — a list of floating-point numbers, typically 768 to 1536 dimensions, where the position in that high-dimensional space encodes semantic meaning. Chunks with similar meaning end up geometrically close to each other. This is what makes semantic search possible.

Embedding models are not interchangeable. Each model learns its own vector space with its own geometry. Embeddings produced by OpenAI’s text-embedding-3-large cannot be used in a system that was indexed with Cohere’s embed-v3, and vice versa — the two spaces are incommensurable.

This creates an operational constraint: whichever model you use at ingestion time must also be used at query time, and switching models requires re-embedding your entire corpus.

Storing the resulting vectors requires a database that can run similarity search over them. Traditional relational databases can’t do this out of the box.

If you’re already running PostgreSQL, pgvector is a practical starting point — it adds vector search as an extension and handles most production workloads without introducing new infrastructure. For the majority of use cases, it’s sufficient.

Dedicated vector databases — Pinecone, Qdrant, Weaviate — become worth evaluating when you need capabilities that pgvector doesn’t handle well: metadata filtering combined with vector search at scale, multi-tenancy, or very high query volumes.

Neo4j is worth noting separately — it’s primarily a graph database but added native vector search in v5.11, which means teams building a knowledge graph in Neo4j can also run vector retrieval within the same system without a separate vector store.

Retrieval: Query, Search, and Generation

At query time, the user’s question is embedded using the same model used during ingestion. The resulting query vector is compared against all stored chunk vectors using a similarity metric — cosine similarity is the most common — and the top-k most similar chunks are returned. The value of k (typically between 3 and 10) is a parameter that affects how much context the LLM receives.

Pure vector search has a known weakness: it struggles with exact-match terms — product codes, names, technical identifiers — where keyword matching outperforms semantic similarity. Many production systems use hybrid search, which combines vector similarity with BM25 (a keyword-based ranking algorithm), and merges the results before passing them to the LLM. This often outperforms either approach alone.

An optional but increasingly common step is re-ranking: a second-stage model (typically a cross-encoder) takes the top-k retrieved chunks and re-scores them for relevance to the specific query, reordering them before injection. This improves precision at the cost of additional latency.

The retrieved chunks are assembled into a prompt template alongside the user’s query and passed to the LLM. The LLM then generates an answer using only the provided context. The quality of the answer is a direct function of what was retrieved — if the relevant chunk wasn’t in the top-k results, the LLM has no way to recover it.

Where RAG Fits

RAG is the right default for most document Q&A use cases. It’s fast to ship, requires no schema design, and handles a wide range of questions well.

It works well for any question where the answer exists in a single passage:

Document Q&A — “What does clause 8 of this contract say?”
Customer support — “What is the refund policy for subscription plans?”
Policy lookup — “What are the leave entitlements for contractors?”
Knowledge base search — “How do I reset 2FA for a user account?”

The retrieval unit matches the answer unit — find the right chunk, and the LLM has what it needs.

RAG also makes sense when you’re still exploring your data. If you don’t yet know what structure your information has, a vector store lets you move fast. You can layer in more structure once the patterns become clear.

RAG Limitations

Chunking documents and matching by semantic similarity has natural limitations in use cases where answers are not contained within a single document or terminology is inconsistent across them.

Multi-hop queries. “Find engineers who worked alongside someone with Kafka expertise on a project that served a fintech client, and who are themselves available” — that’s three hops. RAG fetches text chunks, it can’t traverse connections. The answer doesn’t exist in any single document.

Terminology inconsistency. One CV says “Amazon Web Services”, another says “AWS”, a project brief says “cloud infrastructure”, a performance review says “led the migration to the cloud.” RAG treats these as different things — the embedding similarity isn’t strong enough to reliably connect them. You ask “who has AWS experience” and miss half your engineers because their documents used different words.

If you’re hitting these limitations at scale, GraphRAG or a knowledge graph are worth evaluating depending on how much structure your use case demands.

GraphRAG

An extension of RAG that represents documents as a graph of entities and relationships, enabling retrieval that follows connections rather than relying purely on vector similarity. Microsoft released an open-source implementation in 2024.

How GraphRAG Works

GraphRAG has the same two-phase structure as RAG — an offline construction phase and an online query phase — but what happens in each phase is substantially different.

Graph Construction (Offline)

Where RAG chunks and embeds text, GraphRAG processes each chunk with an LLM to extract structured information from it. For every chunk, the LLM identifies entities (people, organisations, locations, concepts, events) and the relationships between them. Each relationship is stored as a labelled edge: “Pratik works at ThinkXL”, “ThinkXL is a cloud consultancy”. The result is a graph built entirely from your documents, with no schema designed upfront.

Because the same entity can appear under different names across documents — “Google”, “Google LLC”, “Alphabet” — GraphRAG performs entity resolution: it merges references to the same real-world entity into a single canonical node. This is done automatically using LLM-based comparison, and it’s one of the places where extraction quality directly affects graph quality.

Once the graph is built, GraphRAG runs community detection — specifically the Leiden algorithm — to partition entities into clusters of closely related nodes. These clusters are called communities. An LLM then generates a community summary for each cluster, capturing the key themes and relationships within it. These summaries are stored as part of the index and are what enable global questions that RAG cannot answer at all.

This construction phase is significantly more expensive than building a RAG vector index. Every chunk requires LLM calls for extraction, and every community requires a summarization call. RAG indexing uses cheap embedding API calls; GraphRAG indexing uses LLM completion calls per chunk.

Query: Local and Global Search

GraphRAG supports two distinct query modes, which is a key difference from RAG’s single retrieval path.

Local search is used for questions about specific entities or their relationships. The query is used to identify relevant entities via vector search on entity embeddings. GraphRAG then traverses the local neighborhood of those entities — gathering adjacent nodes, relationships, associated text chunks, and community summaries — and assembles this as context for the LLM.

Global search is used for high-level questions that require reasoning across the entire corpus — “What are the major themes in this collection of research papers?” No single passage or local neighborhood can answer this. Instead, GraphRAG uses community summaries in a map-reduce pattern: it generates partial answers from each relevant community summary, then reduces them into a final answer. This is the query mode where GraphRAG most clearly outperforms traditional RAG; RAG has no equivalent mechanism for this class of question.

In both modes, the final answer is still generated by an LLM synthesising the retrieved context. The retrieval is graph-structured, but the output remains probabilistic.

Where GraphRAG Fits

GraphRAG’s main advantage is that it improves retrieval quality without requiring you to design a schema. You feed it unstructured documents and it automatically extracts entities and relationships. No ontology work upfront.

It is a strong fit when your data lives in documents and relationships between entities matter:

Literature review — “What are the recurring themes and connections across these 200 research papers?”
Document analysis — “Which entities appear most frequently in connection with the acquisition discussions?”
Q&A over a large unstructured corpus — “Which reports mention both the regulatory change and the APAC expansion strategy?”

If you need to surface connections that traditional RAG misses, GraphRAG is a meaningful improvement with a relatively low setup cost.

GraphRAG Limitations

Most of GraphRAG’s limitations stem from the fact that the graph is auto-extracted and the answer is still LLM-generated.

Probabilistic output. GraphRAG’s graph improves what gets retrieved, but the final answer is still synthesised by an LLM. For use cases where the answer needs to drive a system action — trigger a workflow, return a structured result, enforce a business rule — probabilistic synthesis is not sufficient. You need a fact, not an interpretation.

No direct query interface. GraphRAG does not expose its graph for structured traversal. There is no Cypher interface, no SPARQL endpoint, no way to ask “return all entities connected to this node by this relationship type.” The graph exists solely to improve retrieval — it is an internal mechanism, not a queryable data layer.

Inconsistent extraction. Because entity extraction and relationship identification are LLM-driven, the same document can produce different graph structures across runs. There is no schema enforcing what gets extracted, no validation layer, and no way to guarantee that a concept represented one way in one document will be resolved the same way in another. For any system that relies on the graph being stable and predictable — compliance, auditing, workflow automation — this is a structural limitation.

These limitations share a common root: the graph was built automatically from unstructured text, not designed to model a specific domain. When a use case requires deterministic answers, live data, structured queries, or auditable reasoning, the right tool is a knowledge graph that was built with intent.

How Is GraphRAG Different from a General Knowledge Graph?

The difference is purpose.

General knowledge graphs — you design the ontology, define node types, relationships, ingest structured data. The graph is precise, queryable, deterministic. You control exactly what goes in.

GraphRAG — you feed it unstructured documents and it automatically extracts entities and relationships to build the graph. No manual ontology design. The graph is built for retrieval, not for precise traversal.

The tradeoff is real in both directions. GraphRAG is faster to get started and requires no domain modeling. A hand-crafted KG requires upfront investment but gives you precision and control over what’s in the graph.

Knowledge Graphs

A schema-driven approach where entities and relationships are explicitly modelled — providing deterministic traversal, repeatable results, inspectable reasoning, and a queryable data layer that behaves consistently across runs.

How Knowledge Graphs Work

A knowledge graph is built around three things: nodes (entities), edges (relationships between them), and properties (attributes on both). Unlike GraphRAG, where the graph emerges from documents, a knowledge graph is designed — you decide what goes in it before any data is loaded.

Schema and Ontology Design

The ontology defines the vocabulary of your graph: what entity types exist, what relationship types connect them, and what properties each can carry. For a professional services firm this might be: Person, Skill, Project, Client, Availability — with relationships like HAS_SKILL, WORKED_ON, SERVED, REPORTS_TO. Every node and edge in the graph conforms to this schema.

This is where the upfront investment lives. Ontology design is a domain modelling problem before it is an engineering one — it requires understanding the business domain well enough to decide which entities matter, which relationships are worth capturing, and what level of granularity is useful. Getting this wrong is expensive: a poorly designed schema limits what queries are possible and is hard to restructure once data is loaded.

Data Ingestion

Data can come from structured sources — HR systems, project management tools, databases, APIs — or from unstructured documents. What changes is how it gets in.

ETL pipelines transform structured records into the graph’s node and edge types and load them incrementally. When the source data changes — a person’s availability updates, a project closes — the corresponding nodes and edges are updated.

For unstructured documents, the LLM is given the ontology and asked to extract only entities and relationships that conform to the defined schema. The output is validated against the schema before ingestion. This keeps the graph consistent (though it requires explicit prompt engineering and validation logic).

Querying with Cypher

Cypher is the query language for labeled property graphs (Neo4j’s model, also used by others). Queries describe patterns to match against the graph rather than search terms to retrieve. A single-hop query looks like:

MATCH (p:Person)-[:HAS_SKILL]->(s:Skill {name: "Kafka"})
RETURN p.name

A multi-hop query follows relationships across nodes:

MATCH (p:Person)-[:HAS_SKILL]->(s:Skill {name: "Kafka"}),
      (p)-[:WORKED_ON]->(proj:Project)-[:SERVED]->(c:Client {industry: "fintech"}),
      (p)-[:HAS_STATUS]->(a:Availability {status: "available"})
RETURN p.name

The graph either matches the pattern or it doesn’t. There is no scoring, no ranking, no synthesis. The result is the data.

Where Knowledge Graphs Fit

A knowledge graph is the right fit when the answer needs to drive a decision or trigger an action — not just inform a reader.

Deterministic answers — “Is this engineer available?” A graph traversal either finds the node or it doesn’t — no synthesis, no interpretation.
Multi-hop traversal over live data — “Find engineers with Kafka expertise currently available on a fintech project.” The graph connects to live data — availability changes, projects close, people join teams.
Auditability — “Show exactly what produced this result.” The exact traversal path that produced the answer is inspectable.
Triggering workflows — “Return all contracts expiring this month.” A graph query returns structured data your systems can consume — not a paragraph of generated text.

The ontology investment is justified when the domain is well-understood, the entities are clearly defined, and their relationships are stable. If the domain is exploratory or the schema is still evolving, a knowledge graph is the wrong choice regardless of scale.

Knowledge Graph Limitations

The properties that make knowledge graphs precise — explicit schema, defined relationships, deterministic traversal — also create constraints that matter in practice.

Schema rigidity. The ontology you design upfront becomes the contract your entire graph is built on. Adding new entity types or relationship types later requires schema migration, re-ingestion of affected data, and updates to query logic. In domains where the data model is still evolving, this rigidity can turn into operational cost.

Blind beyond its ontology. A knowledge graph only knows what was explicitly put into it and does not generalise beyond its schema. If a user asks “who knows stream processing” but the ontology only has a skill node for “Kafka” not “stream processing”, the query returns nothing — even though the source data contains that information. This makes it powerful within its scope and completely blind outside it.

Choosing the Right Approach

The three techniques address different problems. The right starting point depends on what kind of question your system needs to answer and what the answer needs to do.

A useful first filter is the shape of the data itself. Some data is naturally a document — a contract, a support article, a policy. The content is the thing, and the goal is to find the relevant passage. Other data is naturally a graph — an org chart, a dependency tree, a regulatory framework where rules reference standards that supersede directives, a causal chain of incidents where one failure triggered the next. When graph-shaped data is flattened into documents and embedded, the structure that makes it useful is discarded; the LLM then has to reconstruct that structure from text at query time, and it does so imperfectly, because the structure was never in the text to begin with. The same applies to agentic systems: an agent executing a multi-step task is more reliable querying explicit relationships in a structured world model than inferring structure from retrieved chunks at every step.

The tell: if your hardest questions are “find me relevant information about X,” you have document-shaped data. If they’re “given X, what is connected to it, how, and through what path,” you have graph-shaped data.

Start with RAG if:

The answer to your question exists within a single document or passage
You are still exploring your data and don’t yet know its structure
Speed of implementation matters

Move to GraphRAG if:

Your questions require connecting information across multiple documents
You have unstructured documents with implicit relationships worth surfacing
You want better retrieval quality without the cost of designing a schema upfront
Probabilistic answers are still acceptable — you need better retrieval, not determinism

Build a Knowledge Graph if:

You need deterministic, auditable answers — the traversal path must be inspectable
Your queries span multiple hops over live or frequently updated data
You have a well-understood domain with stable entities and relationships worth modelling

RAG, GraphRAG, and knowledge graphs are often discussed as if they sit on a spectrum. But they don’t. They are fundamentally different in what they model, how they answer questions, and what guarantees they provide. Treating them as interchangeable, or defaulting to one without understanding the others, leads to systems that are either incapable or over-engineered.

RAG, GraphRAG, and Knowledge Graphs: What's Actually Different

RAG — Retrieval Augmented Generation

How RAG Works

Where RAG Fits

RAG Limitations

GraphRAG

How GraphRAG Works

Where GraphRAG Fits

GraphRAG Limitations

How Is GraphRAG Different from a General Knowledge Graph?

Knowledge Graphs

How Knowledge Graphs Work

Where Knowledge Graphs Fit

Knowledge Graph Limitations

Choosing the Right Approach

Tags:

Share:

Not sure if you need RAG, GraphRAG, or a knowledge graph?

Related Posts

AgentCore vs Bedrock Agents vs Roll-Your-Own on Fargate

AWS Bedrock vs SageMaker: How to Pick the Right One

Deploying Engineering Resource Management Knowledge Graph on AWS

What Is a Knowledge Graph?

What Is Amazon Bedrock AgentCore? (And When to Use It)

What Is AI FinOps?