RAG Is Not Enough: When to Use Fine-Tuning, Agents, or Knowledge Graphs
Decision framework for choosing between RAG, fine-tuning, agentic retrieval, and knowledge graphs based on data freshness, reasoning depth, cost, latency, and accuracy requirements.
Retrieval-augmented generation has become the default answer to "how should we build enterprise AI." And for good reason — RAG solves the most common LLM limitation (knowledge cutoff and hallucination) with a relatively straightforward architecture: index your documents, retrieve relevant chunks, feed them to the model.
But RAG is one pattern in a family of four. Enterprises that treat it as the only pattern end up building increasingly complex RAG pipelines to solve problems that RAG was never designed to solve. This post provides a decision framework for choosing between RAG, fine-tuning, agentic retrieval, and knowledge graphs — and for recognizing when the right answer is a combination of approaches.
The Four Patterns
Pattern 1: Retrieval-Augmented Generation (RAG)
The model receives retrieved context alongside the user query. The knowledge lives in an external index, not in the model weights.
Strengths:
- Knowledge can be updated without retraining
- Source attribution is straightforward
- Works well for factual Q&A over a known corpus
- Cost-effective — no model training required
Limitations:
- Retrieval quality is the ceiling. If the right chunks are not retrieved, the answer is wrong.
- Chunk boundaries are arbitrary. Important context often spans multiple chunks.
- Multi-hop reasoning is weak. Questions like "Which suppliers had delivery delays in Q3 that also had quality issues in Q4?" require synthesis across many documents.
- Context window saturation. Stuffing 20 chunks into the context often degrades answer quality compared to 5 well-chosen chunks.
Pattern 2: Fine-Tuning
The model weights are updated with domain-specific training data. The knowledge and reasoning patterns are encoded into the model itself.
Strengths:
- Consistent output format and style (critical for regulatory documents)
- Domain-specific reasoning patterns (medical differential diagnosis, legal analysis)
- Lower latency — no retrieval step
- Smaller context windows needed per request (lower per-query cost)
Limitations:
- Expensive to train and maintain. Fine-tuning GPT-4o costs significantly more than building a RAG pipeline.
- Knowledge becomes stale. Retraining is required to incorporate new information.
- Catastrophic forgetting risk. Aggressive fine-tuning can degrade the model's general capabilities.
- Evaluation is harder. You need domain-expert evaluation of fine-tuned outputs.
When to Fine-Tune Instead of RAG
| Scenario | RAG | Fine-Tuning | Why |
|---|---|---|---|
| Answer questions about company policies | Best | Overkill | Policies are document-based, RAG retrieves directly |
| Draft legal contracts in house style | Insufficient | Best | Style consistency requires model-level learning |
| Medical report generation | Insufficient | Best | Domain reasoning patterns need to be internalized |
| Customer support over product docs | Best | Overkill | Factual retrieval, docs change frequently |
| Code generation in proprietary framework | Moderate | Best | Framework patterns need model-level understanding |
| Translate technical docs to plain language | Moderate | Best | Consistent tone and simplification patterns |
Pattern 3: Agentic Retrieval
An AI agent decides what to retrieve, when to retrieve it, and whether the retrieved information is sufficient. Unlike basic RAG where retrieval happens once, agentic retrieval is iterative and reasoning-driven.
Strengths:
- Handles multi-hop reasoning naturally. The agent decomposes complex questions into retrievable sub-questions.
- Multi-source retrieval. The agent can query vector stores, SQL databases, APIs, and knowledge graphs in the same interaction.
- Self-correcting. If initial retrieval results are poor, the agent can reformulate and retry.
- Dynamic tool selection. The agent chooses the right retrieval method based on the query type.
Limitations:
- Higher latency. Multiple retrieval rounds add 3-10x latency compared to single-shot RAG.
- Higher cost. Each retrieval round and reasoning step consumes tokens.
- Non-deterministic. The same query may follow different retrieval paths on different runs.
- Requires guardrails. Without budget caps and iteration limits, agents can enter retrieval loops.
Implementation: Agentic RAG with Semantic Kernel
// Agentic RAG — agent decides retrieval strategy
var kernel = Kernel.CreateBuilder()
.AddAzureOpenAIChatCompletion("gpt-4o", endpoint, credential)
.Build();
// Multiple retrieval tools available to the agent
kernel.Plugins.AddFromType<VectorSearchPlugin>(); // Semantic search
kernel.Plugins.AddFromType<SqlQueryPlugin>(); // Structured data
kernel.Plugins.AddFromType<GraphQueryPlugin>(); // Knowledge graph
kernel.Plugins.AddFromType<ApiPlugin>(); // External APIs
var agent = new ChatCompletionAgent
{
Name = "ResearchAgent",
Instructions = """
You are a research agent with access to multiple data sources.
For factual questions about documents, use vector_search.
For questions involving numbers, dates, or comparisons, use sql_query.
For questions about relationships between entities, use graph_query.
For real-time external data, use api_call.
Always verify your findings by cross-referencing at least two sources
when possible. If initial results are insufficient, reformulate your
query and try again. Cite your sources in the final response.
Maximum retrieval rounds: 5. If you cannot find sufficient information
in 5 rounds, state what you found and what remains uncertain.
""",
Kernel = kernel,
};Pattern 4: Knowledge Graphs
Entities, relationships, and hierarchies are extracted from documents and stored in a graph database. At query time, graph traversal provides structured, relationship-aware context to the LLM.
Strengths:
- Multi-hop reasoning is native. "Who manages the team responsible for the product that had the most quality issues?" traverses the graph directly.
- Structured relationships. Unlike flat text chunks, graph context preserves entity types, relationship types, and hierarchies.
- Explainability. The graph traversal path is the reasoning chain — fully auditable.
- Complementary to RAG. Graph context provides structural understanding; text chunks provide detail.
Limitations:
- Graph construction is expensive. Entity extraction and relationship mapping require significant upfront investment.
- Maintenance burden. The graph must be kept in sync with source documents.
- Schema design complexity. A poorly designed graph schema produces irrelevant traversals.
- Query patterns must be anticipated. The graph is only as useful as the traversal queries designed for it.
The Decision Framework
Step 1: Classify Your Query Types
Analyze a representative sample of queries your system will handle. Classify each query:
| Query Type | Example | Best Pattern |
|---|---|---|
| Factual lookup | "What is our return policy?" | RAG |
| Multi-document synthesis | "Summarize all risks from Q1 audit reports" | RAG (with reranking) |
| Relational reasoning | "Which vendors supply components to products with open recalls?" | Knowledge Graph |
| Numerical/aggregation | "What was total revenue from EMEA in H2?" | Agentic (SQL tool) |
| Format-consistent generation | "Draft a board memo in our standard format" | Fine-Tuning |
| Multi-step research | "Compare our pricing strategy to competitors and identify gaps" | Agentic RAG |
| Domain-specific reasoning | "Assess the legal risk of this contract clause" | Fine-Tuning + RAG |
Step 2: Evaluate Constraints
| Constraint | RAG | Fine-Tuning | Agentic | Knowledge Graph |
|---|---|---|---|---|
| Data changes daily | Good | Poor | Good | Moderate |
| Latency under 2 seconds | Good | Best | Poor | Moderate |
| Cost per query under $0.01 | Good | Best | Poor | Good |
| Must cite sources | Good | Poor | Good | Good |
| 99.5% accuracy required | Moderate | Good | Moderate | Good |
| Reasoning over relationships | Poor | Poor | Good | Best |
| Team has ML engineers | Not needed | Required | Helpful | Required |
| Regulatory audit trail | Moderate | Poor | Good (with logging) | Best |
Step 3: Consider Combinations
The most effective enterprise systems combine patterns:
GraphRAG — Knowledge graph provides structural context. RAG provides detailed text from relevant chunks. The LLM receives both.
Agentic RAG — An agent orchestrates retrieval from multiple sources, including RAG pipelines, SQL databases, and APIs. The agent decides what to retrieve and when.
Fine-Tuned Model + RAG — The model is fine-tuned for domain-specific reasoning patterns and output format. RAG provides current factual context. The fine-tuned model produces higher quality outputs from the same retrieved context because it understands the domain deeply.
Enterprise Scenario Examples
Scenario: Internal legal review assistant
- Query type: Contract clause analysis, risk assessment, precedent lookup
- Pattern: Fine-tuning (legal reasoning) + RAG (clause retrieval) + Knowledge Graph (precedent relationships)
- Why: Legal reasoning requires domain-internalized patterns. Specific clauses need retrieval. Precedent relationships are graph-native.
Scenario: Customer support bot
- Query type: Product questions, order status, troubleshooting
- Pattern: RAG (product docs) + Agentic (order lookup via API)
- Why: Product knowledge is document-based (RAG). Order data is structured (agent with API tools). No fine-tuning needed — generic model handles conversational style well.
Scenario: Supply chain risk analysis
- Query type: Multi-hop reasoning across suppliers, components, geographies, and risk factors
- Pattern: Knowledge Graph (supplier/component/risk relationships) + Agentic RAG (current news and reports)
- Why: Supply chain relationships are inherently graph-structured. Current risk assessment requires real-time retrieval.
Scenario: Medical literature review
- Query type: Synthesize findings across clinical studies, identify contradictions
- Pattern: Fine-tuning (medical reasoning) + Knowledge Graph (study/drug/condition relationships) + RAG (study details)
- Why: Medical reasoning requires internalized domain knowledge. Study relationships are graph-native. Individual study details need retrieval.
Cost and Complexity Comparison
| Aspect | RAG | Fine-Tuning | Agentic | Knowledge Graph |
|---|---|---|---|---|
| Setup cost | Low ($5-20K) | Medium ($20-50K) | Medium ($15-40K) | High ($50-150K) |
| Per-query cost | $0.002-0.01 | $0.001-0.005 | $0.01-0.10 | $0.005-0.02 |
| Maintenance burden | Low (reindex) | High (retrain) | Medium (tools) | High (graph sync) |
| Time to production | 2-4 weeks | 4-8 weeks | 4-8 weeks | 8-16 weeks |
| Team skills needed | ML engineer | ML + domain expert | ML + backend | ML + knowledge eng. |
| Quality ceiling | Moderate | High (for domain) | High | High (for relationships) |
Pattern Selection Decision Flow
Practical Recommendations
-
Start with RAG. It solves 60-70% of enterprise knowledge retrieval use cases with the lowest investment. Prove the value before adding complexity.
-
Add agentic retrieval when RAG hits a wall. If users consistently ask questions that require multiple retrieval rounds or cross-source synthesis, agentic retrieval addresses these gaps.
-
Invest in knowledge graphs when relationships are the core value. Supply chains, organizational structures, regulatory frameworks, product dependencies — if the relationships between entities matter more than the entities themselves, a knowledge graph is justified.
-
Fine-tune when output quality and consistency plateau. If your RAG system retrieves the right information but the model struggles to reason about it or produce domain-appropriate outputs, fine-tuning addresses the model capability gap.
-
Never combine all four patterns from the start. Each pattern adds operational complexity. Prove each addition solves a specific, measured quality gap before adding the next.
The Honest Reality
RAG is genuinely sufficient for most enterprise AI applications today. The push toward agents, knowledge graphs, and fine-tuning should be driven by measured quality gaps, not by technology enthusiasm.
The enterprises that get the best ROI from their AI investments are the ones that:
- Build a solid RAG baseline first
- Measure where it fails with real user queries
- Add complexity only where the data justifies it
- Keep their architecture as simple as the problem allows
The enterprises that struggle are the ones that start with the most complex architecture because it sounds impressive, then spend months debugging a knowledge graph that a well-configured RAG pipeline would have outperformed.
Need help choosing the right AI retrieval architecture for your use case? Contact our team — we help enterprises design AI systems that match the complexity of the solution to the complexity of the problem.
Topics