Catch RAG failures before your users do.
RAG-Forge audits any RAG pipeline against the RAG Maturity Model. Detect hallucinations, retrieval bypass, silent quality regressions, and cost drift before they ship — with a single CLI that works on your existing stack.
The RAG quality crisis
RAG has become the dominant architecture for enterprise AI. Yet the ecosystem suffers from a critical gap between building RAG pipelines and knowing whether they actually work.
of teams cite quality as the #1 GenAI deployment barrier
LangChain State of AI Agents 2026
is where most production RAG pipelines actually sit — naive vector search with no quality framework
RAG-Forge Maturity Model
open-source frameworks score any pipeline against a maturity model with framework-agnostic CLI tooling — RAG-Forge is one of them
RAG-Forge
Everything you need to ship a production RAG pipeline
Pipeline Primitives
Five chunking strategies, dense + sparse + hybrid retrieval, contextual enrichment, and reranking. Bring your own embedding model.
Evaluation as a CI/CD Gate
RAGAS, DeepEval, and LLM-as-Judge baked in. Cost + time estimates before each run, skip-aware aggregation, configurable thresholds in rag-forge.config.ts.
Built-in Observability
OpenTelemetry tracing on every pipeline stage. Drift detection, cost estimation, semantic caching.
Production Templates
Five battle-tested starting points. shadcn/ui model — you own every line of code.
The RAG Maturity Model
Where does your pipeline stand? Score any RAG system from RMM-0 (naive) to RMM-5 (enterprise).
- RMM-0
Naive
Basic vector search works
Gate: Vector retrieval returns results
- RMM-1
Better Recall
Hybrid search active, Recall@5 > 70%
Gate: Dense + sparse + RRF fusion
- RMM-2
Better Precision
Reranker active, nDCG@10 +10%
Gate: Cross-encoder reranking on top results
- RMM-3
Better Trust
← Most pipelines stop hereGuardrails, faithfulness > 85%, citations
Gate: InputGuard + OutputGuard active
- RMM-4
Better Workflow
Caching, P95 < 4s, cost tracking
Gate: Semantic cache + telemetry + cost meter
- RMM-5
Enterprise
Drift detection, CI/CD gates, adversarial tests
Gate: All audit thresholds pass
Get started in 60 seconds
# Install the CLI
npm install -g @rag-forge/cli
# Scaffold a project (use --directory to name the folder)
rag-forge init basic --directory my-rag-project
cd my-rag-project
# Drop your documents into a folder of your choice
mkdir docs
echo "RAG-Forge is a CLI for building and evaluating RAG pipelines." > docs/example.md
# Index your docs and run an audit
rag-forge index --source ./docs
rag-forge audit --golden-set eval/golden_set.jsonHow RAG-Forge compares
| Feature | rag-forge | langchain | llamaindex | ragas |
|---|---|---|---|---|
| Framework agnostic (audit any pipeline) | yes | no | partial | yes |
| Evaluation built in (CI/CD gate) | yes | partial | partial | yes |
| RAG Maturity Model scoring | yes | no | no | no |
| OpenTelemetry native | yes | partial | no | no |
| MCP server | yes | no | no | no |
| CLI scaffolding | yes | no | partial | no |
| Code ownership (shadcn model) | yes | no | no | no |
| Drift detection | yes | no | no | no |
Comparison based on publicly available features as of April 2026.
Peer strengths worth knowing
- RAGAS: Deeper metric research and a larger community. RAG-Forge's evaluator supports RAGAS as a backend — `rag-forge audit --evaluator ragas`.
- LangChain & LlamaIndex: Far broader integration ecosystems if you're already invested in their framework. RAG-Forge complements them by sitting on top of any pipeline.
- Giskard: Strong general-purpose ML testing story beyond RAG.
Pick the tool that matches your stage. RAG-Forge's wedge is the full lifecycle — scaffold → evaluate → score → ship — in one CLI, with the RAG Maturity Model as the objective function.
Start from a template
basic
First RAG project, simple Q&A
hybrid
Production-ready document Q&A with reranking
agentic
Multi-hop reasoning with query decomposition
enterprise
Regulated industries with full security suite
n8n
AI automation agency deployments