[ rag-forge ]
v0.1.3 — Audit resilience

Catch RAG failures before your users do.

RAG-Forge audits any RAG pipeline against the RAG Maturity Model. Detect hallucinations, retrieval bypass, silent quality regressions, and cost drift before they ship — with a single CLI that works on your existing stack.

$npm install -g @rag-forge/cli
View on GitHub
rag-forge audit
$ rag-forge audit --golden-set eval/golden_set.json --judge claude
RAG-Forge Audit
===============
Samples: 19
Metrics: 4 (faithfulness, context_relevance, answer_relevance, hallucination)
Judge calls: 76 total
Judge model: claude-sonnet-4-20250514
Estimated cost: ~$1.25 USD
---
[ 1/19] [query redacted] faith=0.92 ctx=0.85 ans=0.91 hall=0.95 OK (8.2s)
[ 2/19] [query redacted] faith=0.88 ctx=0.79 ans=0.90 hall=0.93 OK (9.1s)
[ 3/19] [query redacted] faith=0.00 ctx=0.00 ans=0.78 hall=0.85 WARN 2 skipped (11.4s)
...
---
Audit complete in 9m 23s
Scored: 72 Skipped: 4
RMM-3Better Trust
v0.1.3 just shipped
MITlicensed
OIDC Trusted Publishers
Be one of the first 100

The RAG quality crisis

RAG has become the dominant architecture for enterprise AI. Yet the ecosystem suffers from a critical gap between building RAG pipelines and knowing whether they actually work.

32%

of teams cite quality as the #1 GenAI deployment barrier

LangChain State of AI Agents 2026

RMM-0

is where most production RAG pipelines actually sit — naive vector search with no quality framework

RAG-Forge Maturity Model

Few

open-source frameworks score any pipeline against a maturity model with framework-agnostic CLI tooling — RAG-Forge is one of them

RAG-Forge

Everything you need to ship a production RAG pipeline

Pipeline Primitives

Five chunking strategies, dense + sparse + hybrid retrieval, contextual enrichment, and reranking. Bring your own embedding model.

$ create_chunker(ChunkConfig(strategy="semantic"))

Evaluation as a CI/CD Gate

RAGAS, DeepEval, and LLM-as-Judge baked in. Cost + time estimates before each run, skip-aware aggregation, configurable thresholds in rag-forge.config.ts.

$ rag-forge audit --golden-set qa.json --judge claude

Built-in Observability

OpenTelemetry tracing on every pipeline stage. Drift detection, cost estimation, semantic caching.

$ rag-forge drift report --baseline baseline.json

Production Templates

Five battle-tested starting points. shadcn/ui model — you own every line of code.

$ rag-forge init enterprise

The RAG Maturity Model

Where does your pipeline stand? Score any RAG system from RMM-0 (naive) to RMM-5 (enterprise).

  1. RMM-0

    Naive

    Basic vector search works

    Gate: Vector retrieval returns results

  2. RMM-1

    Better Recall

    Hybrid search active, Recall@5 > 70%

    Gate: Dense + sparse + RRF fusion

  3. RMM-2

    Better Precision

    Reranker active, nDCG@10 +10%

    Gate: Cross-encoder reranking on top results

  4. RMM-3

    Better Trust

    ← Most pipelines stop here

    Guardrails, faithfulness > 85%, citations

    Gate: InputGuard + OutputGuard active

  5. RMM-4

    Better Workflow

    Caching, P95 < 4s, cost tracking

    Gate: Semantic cache + telemetry + cost meter

  6. RMM-5

    Enterprise

    Drift detection, CI/CD gates, adversarial tests

    Gate: All audit thresholds pass

Get started in 60 seconds

# Install the CLI
npm install -g @rag-forge/cli

# Scaffold a project (use --directory to name the folder)
rag-forge init basic --directory my-rag-project
cd my-rag-project

# Drop your documents into a folder of your choice
mkdir docs
echo "RAG-Forge is a CLI for building and evaluating RAG pipelines." > docs/example.md

# Index your docs and run an audit
rag-forge index --source ./docs
rag-forge audit --golden-set eval/golden_set.json

How RAG-Forge compares

Featurerag-forgelangchainllamaindexragas
Framework agnostic (audit any pipeline)yesnopartialyes
Evaluation built in (CI/CD gate)yespartialpartialyes
RAG Maturity Model scoringyesnonono
OpenTelemetry nativeyespartialnono
MCP serveryesnonono
CLI scaffoldingyesnopartialno
Code ownership (shadcn model)yesnonono
Drift detectionyesnonono

Comparison based on publicly available features as of April 2026.

Peer strengths worth knowing

  • RAGAS: Deeper metric research and a larger community. RAG-Forge's evaluator supports RAGAS as a backend — `rag-forge audit --evaluator ragas`.
  • LangChain & LlamaIndex: Far broader integration ecosystems if you're already invested in their framework. RAG-Forge complements them by sitting on top of any pipeline.
  • Giskard: Strong general-purpose ML testing story beyond RAG.

Pick the tool that matches your stage. RAG-Forge's wedge is the full lifecycle — scaffold → evaluate → score → ship — in one CLI, with the RAG Maturity Model as the objective function.

Start from a template

basic

Beginner

First RAG project, simple Q&A

$rag-forge init basic

hybrid

Intermediate

Production-ready document Q&A with reranking

$rag-forge init hybrid

agentic

Advanced

Multi-hop reasoning with query decomposition

$rag-forge init agentic

enterprise

Advanced

Regulated industries with full security suite

$rag-forge init enterprise

n8n

Intermediate

AI automation agency deployments

$rag-forge init n8n