regrada

The CI gate for LLM behavior.

Your AI doesn't break loudly — it drifts quietly. Regrada catches behavioral regressions before they reach production.

Record live LLM traffic via HTTP proxy, convert traces into test cases automatically, and enforce policies in CI — zero code changes required.

Fastest First Run

$ curl -fsSL https://downloads.regrada.com/install.sh | sh

$ regrada init --non-interactive

Then run regrada baseline and regrada test.

Choose a Plan View Documentation

terminal

$ regrada test

Loaded 8 test cases · baseline: origin/main (a3f9c12)

✓ summarize.short_input

✓ summarize.long_input

✓ refund.eligible

✓ refund.ineligible

✓ safety.pii_redaction

✓ safety.refusal_rate

✗ onboarding.new_user

variance exceeded: semantic_similarity 0.61 < 0.85

✗ onboarding.returning_user

policy violation: pii_detected (email in response)

8 cases · 6 passed · 2 failed · exit 2

What Regrada Does

LLMs don't throw exceptions when they regress — they just quietly start giving worse answers. Regrada gives your AI the same test coverage you give your code.

>regrada record — capture live LLM traffic through an HTTP proxy. No code changes. No SDK wrapping.

>regrada accept — promote recorded traces into version-controlled YAML test cases with baselines.

>regrada test — replay cases, diff against baselines, evaluate policies, and exit non-zero on failure.

Without Regrada

✗Model version bumps silently change tone, format, and refusal behavior

✗Prompt edits ship untested — regressions surface in user complaints

✗PII leaks in responses go undetected until audit time

✗Manual eval scripts are brittle, slow, and never run in CI

With Regrada

✓Every PR is gated on behavioral correctness, not just unit tests

✓Prompt and model changes are validated against real production traces

✓PII and compliance policies enforced automatically on every run

✓Test cases live in your repo, reviewed in your PRs, run in your CI

Built for Production AI

Everything you need to test, validate, and continuously monitor LLM behavior in real systems.

📡

Zero-Code Traffic Capture

Point your app at Regrada's proxy and every LLM call is recorded automatically — no SDK changes, no instrumentation, no configuration. Works with existing apps on day one.

🔌

Supported Providers

Native support for OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, and a built-in mock provider for local smoke tests. Swap supported models without rewriting your tests.

🔍

Intelligent Policy Engine

Catch regressions that unit tests miss. Enforce assertions, variance thresholds, refusal rates, PII presence, latency budgets, and JSON schema compliance — all defined as code.

🔒

Automatic PII & Secrets Redaction

Sensitive data is stripped from traces before they ever leave your environment. Ship test artifacts to CI without exposing customer data or API keys.

🚦

Native GitHub Actions Integration

Regrada runs as a CI step, posts results as PR comments, and blocks merges on policy violations. Catch regressions in the pull request, not in the post-mortem.

📊

Web Dashboard

Visualize trace history, compare baselines side by side, and drill into failing assertions — all from a single dashboard. No more grepping through CI logs.

How It Works

From first trace to protected CI in four steps. No new infrastructure. No new programming model.

Record Real Traffic

Run regrada record — it starts an HTTPS proxy that intercepts and logs every LLM API call your app makes. Zero instrumentation required.

Accept into Test Cases

Run regrada accept to promote recorded traces into version-controlled YAML test cases with baseline snapshots. Commit them alongside your code.

Run Evaluations

Run regrada test to replay cases, diff outputs against baselines, and evaluate every configured policy — assertions, variance, PII, latency, and more.

Gate Your CI

Add regrada test to your GitHub Actions workflow. Regressions block the merge. Results post as a PR comment. The rest of your pipeline stays untouched.

Tests Pass

All checks successful, ready to deploy

Tests Fail

Regression detected, review changes

Who It's For

Built for teams that ship AI features and need to know when they break.

AI Product Teams

Ship prompt updates and model upgrades with the same confidence as a code deploy.

Platform Engineers

Add LLM behavior gates to your existing CI/CD pipeline without changing your stack.

ML Engineers

Regression-test fine-tuned models and evaluate output quality against real baselines.

Compliance-Conscious Teams

Enforce PII policies, redact sensitive traces, and produce audit-ready test reports automatically.

If AI is in your critical path, it deserves the same rigor as your code.

Why Regrada

• Unit tests assert that your code runs. They say nothing about whether your AI still behaves correctly after a model bump or prompt edit.

• LLM regressions are invisible in logs and silent in metrics — they show up as churn, support tickets, and damaged trust.

• The fastest way to catch a regression is in the pull request. Regrada puts that check exactly there — automated, repeatable, and zero overhead.

Regrada is the safety net your AI pipeline is missing.

Pricing

Start free and scale as you grow. No hidden fees.

Starter

Free

For individual developers exploring Regrada

Team

Popular

$29/month

For small teams running AI features in production

Scale

$99/month

For companies scaling CI and AI workflows

View full pricing & features comparison →

Your AI deserves a test suite.

Start catching behavioral regressions in CI — before your users catch them in production.