Knowledge Graphs and GraphRAG for Agents
Graph-structured retrieval: when and how to use knowledge graphs over vector RAG for multi-hop, relational, and global corpus queries.
The problem vector RAG does not solve
Standard vector RAG retrieves chunks by embedding similarity. That works well for single-hop, fact-lookup questions ("What does X say about Y?") but fails on three distinct problem classes:
- Multi-hop / relational — "How is entity A connected to entity C through B?" requires traversing a path that pure cosine similarity cannot reconstruct.
- Global summarization — "What are the main themes across this whole corpus?" requires aggregating over the entire document set, not retrieving a few chunks.
- Cross-document entity linking — the same person, organization, or concept appears under different surface forms across documents; a graph merges these into one node.
Knowledge graphs model a corpus as entities (nodes) and relationships (typed edges), enabling structured traversal at query time.
Cross-link: /resources/rag-retrieval-for-agents for the vector-RAG baseline.
Core concepts
Graph construction — an LLM reads each document chunk and extracts named
entities plus the relationships between them (e.g., Person –WORKS_AT→ Company).
This is expensive: each chunk costs one or more LLM calls. The output is a property
graph stored in a graph database.
Community detection — clustering algorithms (e.g., Leiden) group densely connected entities into communities. The Microsoft GraphRAG system (arxiv:2404.16130) then pre-generates a hierarchical summary for each community using an LLM. These summaries are the backbone of global search.
Local search — uses entities as the query entry point. The query is embedded to find nearest-neighbor entities, then graph traversal expands outward through relationships and community context to build the prompt. Best for targeted, entity-specific questions.
Global search — instead of traversing, it broadcasts the query across all pre-computed community summaries, collects partial answers (MAP step), and aggregates them into a final response (REDUCE step). Best for whole-corpus sensemaking and thematic questions.
Approaches and variants
Microsoft GraphRAG (microsoft/graphrag) — the reference implementation of
the arxiv:2404.16130 paper ("From Local to Global: A Graph RAG Approach to
Query-Focused Summarization"). Produces a full Leiden-community hierarchy with
pre-generated summaries. Also supports a DRIFT search mode that combines global
and local search. Documented at https://microsoft.github.io/graphrag/ and
https://github.com/microsoft/graphrag. Note: not an officially supported Microsoft
product; graph construction is intentionally expensive.
LightRAG (HKUDS/LightRAG, arXiv:2410.05779, EMNLP 2025) — a lighter
alternative that builds a dual-layer knowledge graph (entities + higher-level
concepts) alongside vector embeddings. Supports five query modes: local, global,
hybrid, naive (pure vector), and mix (default). Incremental graph updates avoid
full re-indexing. GitHub: https://github.com/HKUDS/LightRAG
HippoRAG (OSU-NLP-Group/HippoRAG, arXiv:2405.14831, NeurIPS 2024) —
inspired by hippocampal indexing theory. Combines knowledge graphs with
Personalized PageRank to model associative memory. Demonstrated up to 20%
improvement on multi-hop QA vs. standard RAG with lower latency than iterative
retrieval approaches. HippoRAG 2 (arXiv:2502.14802, ICML 2025) extends to
continual learning. GitHub: https://github.com/OSU-NLP-Group/HippoRAG
Hybrid vector + graph — combine a vector store for chunk retrieval with a
graph store for relational traversal. LlamaIndex PropertyGraphIndex and
Neo4j's neo4j-graphrag-python package both support this pattern natively.
Storage
Property graphs — nodes and edges carry key-value properties. Neo4j (Cypher query language) is the dominant choice; others include Amazon Neptune (supports both property graph and RDF) and Memgraph. Most GraphRAG tooling targets property graphs.
RDF / triple stores — represent facts as subject–predicate–object triples, queried via SPARQL. Stronger semantic interoperability (W3C standards, ontology reasoning) but heavier join overhead at scale. Less common in LLM-era agent pipelines.
Tooling:
LlamaIndex
PropertyGraphIndex— constructs a property graph from documents via LLM extraction, stores in a pluggable graph backend (Neo4j, in-memory, etc.), and exposes multiple retriever types including keyword-entity lookup and vector-based graph node retrieval. Docs: https://developers.llamaindex.ai/python/framework/module_guides/indexing/lpg_index_guide/LangChain
langchain-neo4j(GraphCypherQAChain) — generates Cypher queries from natural language against a Neo4j graph. The LLM is given the graph schema; it produces a Cypher query, executes it, and reasons over the result. Docs: https://python.langchain.com/docs/integrations/graphs/neo4j_cypher/neo4j-graphrag-python— Neo4j's own Python package for building RAG pipelines over Neo4j, including a Knowledge Graph Builder pipeline that extracts entities from unstructured text. Docs: https://neo4j.com/docs/neo4j-graphrag-python/current/
Tradeoffs
| Dimension | Vector RAG | GraphRAG |
|---|---|---|
| Build cost | Low (embed chunks) | High (many LLM calls per chunk) |
| Update / freshness | Re-embed changed chunks | Re-extract affected subgraph |
| Multi-hop queries | Poor | Strong |
| Global summarization | Poor | Strong |
| Operational complexity | Low | High |
| Best for | Fact lookup, semantic search | Relational, entity-centric, whole-corpus |
Use GraphRAG when: your queries span multiple entities and require traversing relationships; you need "what is the overall picture" summaries; or entities appear under many surface forms across documents.
Stick with plain RAG when: questions are single-hop or semantic; the corpus is small or fast-changing; or build cost/latency constraints are tight.
Agentic angle
An agent can treat a knowledge graph as a tool: issue graph queries (Cypher, SPARQL, or a higher-level API) as discrete tool calls, inspect the subgraph returned, and decide whether to traverse further. This fits naturally into MCP or function-calling patterns — each traversal step is a tool call with a verifiable intermediate result.
The most robust production pattern combines vector retrieval (fast chunk lookup) with graph traversal (relational context): the vector index answers "what chunks are relevant?" and the graph answers "how are these entities connected?".
Cross-links: /resources/reliable-tool-calling · /resources/embeddings-vector-search · /resources/agent-memory-context
Verified sources
- Microsoft GraphRAG repo: https://github.com/microsoft/graphrag
- Microsoft GraphRAG docs: https://microsoft.github.io/graphrag/
- GraphRAG paper (arXiv:2404.16130): https://arxiv.org/abs/2404.16130
- LightRAG repo (HKUDS, arXiv:2410.05779): https://github.com/HKUDS/LightRAG
- HippoRAG repo (OSU-NLP-Group, arXiv:2405.14831): https://github.com/OSU-NLP-Group/HippoRAG
- HippoRAG 2 (arXiv:2502.14802): https://arxiv.org/abs/2502.14802
- LlamaIndex PropertyGraphIndex guide: https://developers.llamaindex.ai/python/framework/module_guides/indexing/lpg_index_guide/
- LangChain Neo4j integration: https://python.langchain.com/docs/integrations/graphs/neo4j_cypher/
- Neo4j GraphRAG Python package: https://neo4j.com/docs/neo4j-graphrag-python/current/