ChangeGamer

← All resources

Fine-Tuning vs RAG vs Prompting

Guide · updated 2026-06-21 · Markdown variant

Decision guide for agent builders: when to use prompting, RAG, or fine-tuning — and how they combine. Covers SFT, LoRA/QLoRA, DPO, distillation, and a symptom-to-fix table.


The mental model

These three techniques solve different problems and are routinely combined in production systems. Treating them as competitors leads to the wrong choice every time.

Key rule: knowledge that changes often belongs in RAG, not fine-tuned weights.

Try in this order

  1. Prompting first — clearer instructions, few-shot examples, structured delimiters, output schemas. See /resources/prompt-context-engineering. Fastest to iterate; fully reversible.
  2. Add RAG before fine-tuning if the bottleneck is missing or stale knowledge. See /resources/rag-retrieval-for-agents.
  3. Fine-tune only when you have a persistent behavioral defect that prompting and RAG cannot fix, and you have enough high-quality labeled examples to train on.

Fine-tuning methods

Supervised fine-tuning (SFT) — train on input/output pairs demonstrating desired behavior. OpenAI documents SFT as a supported method for style, format, and task adaptation (platform.openai.com/docs/guides/supervised-fine-tuning).

Parameter-efficient fine-tuning (PEFT) — instead of updating all weights, inject small trainable matrices. The dominant method is LoRA (Low-Rank Adaptation, Hu et al., arXiv:2106.09685): freeze the base weights and add a pair of low-rank matrices (W = W₀ + AB) to each transformer layer. QLoRA extends LoRA by quantizing the base weights to 4-bit before adding the adapters, dramatically reducing GPU memory requirements. PEFT methods produce swappable adapter files that share the base model, making multi-task serving much cheaper than keeping separate full copies.

Preference fine-tuning (DPO / RLHF) — align the model to human preferences via ranked pairs of outputs (preferred vs. rejected). RLHF (Reinforcement Learning from Human Feedback) uses a learned reward model and policy-gradient updates. DPO (Direct Preference Optimization, Rafailov et al., arXiv:2305.18290) simplifies this: it directly optimizes a classification-style loss over preference pairs, eliminating the separate reward model and RL training loop, while matching or exceeding RLHF quality. Standard practice is SFT first, then DPO.

Distillation — train a smaller model to mimic a larger one's outputs on a narrow task. Use when you need a smaller, cheaper, faster model that matches a frontier model on a specific task. Requires a dataset of (input, large-model-output) pairs. Cross-link: /resources/open-weight-models-for-agents for which base models are fine-tunable.

Symptom-to-fix table

Symptom Likely fix
Model lacks current or proprietary facts RAG
Output format or schema is unreliable Better prompt + structured outputs; fine-tune (SFT) if persistent
Tone or style is wrong Improve system prompt; fine-tune (SFT) if consistent across inputs
Model too slow or expensive at scale Distillation/fine-tune a smaller model; see /resources/agent-cost-latency-optimization
Model makes tool-calling mistakes Structured output + typed schemas; SFT on tool-call examples
Need model to follow complex instructions reliably Few-shot prompting first; SFT if it fails at scale
Behavior must reflect human ranking preferences DPO or RLHF after SFT

Honest tradeoffs of fine-tuning

Fine-tuning carries real costs that teams underestimate:

Combining all three

The 2026 production default for complex agents is: a fine-tuned (or instruction-tuned) model that has been preference-aligned, served with RAG for current knowledge, and steered per-request via structured system prompts. These layers are additive: adding RAG to a fine-tuned model is normal; adding a better system prompt to a RAG-augmented fine-tuned model is normal.

Cross-links: /resources/rag-retrieval-for-agents · /resources/prompt-context-engineering · /resources/reliable-tool-calling · /resources/agent-cost-latency-optimization

Verified sources

#fine-tuning #rag #prompting #lora #dpo #sft #distillation #agents #decision-guide

Category: Guide