The CI gate for LLM behavior.
Your AI doesn't break loudly — it drifts quietly. Regrada catches behavioral regressions before they reach production.
Record live LLM traffic via HTTP proxy, convert traces into test cases automatically, and enforce policies in CI — zero code changes required.
Fastest First Run
$ curl -fsSL https://downloads.regrada.com/install.sh | sh
$ regrada init --non-interactive
Then run regrada baseline and regrada test.
$ regrada test
Loaded 8 test cases · baseline: origin/main (a3f9c12)
✓ summarize.short_input
✓ summarize.long_input
✓ refund.eligible
✓ refund.ineligible
✓ safety.pii_redaction
✓ safety.refusal_rate
✗ onboarding.new_user
variance exceeded: semantic_similarity 0.61 < 0.85
✗ onboarding.returning_user
policy violation: pii_detected (email in response)
8 cases · 6 passed · 2 failed · exit 2
What Regrada Does
LLMs don't throw exceptions when they regress — they just quietly start giving worse answers. Regrada gives your AI the same test coverage you give your code.
>regrada record — capture live LLM traffic through an HTTP proxy. No code changes. No SDK wrapping.
>regrada accept — promote recorded traces into version-controlled YAML test cases with baselines.
>regrada test — replay cases, diff against baselines, evaluate policies, and exit non-zero on failure.
Without Regrada
✗Model version bumps silently change tone, format, and refusal behavior
✗Prompt edits ship untested — regressions surface in user complaints
✗PII leaks in responses go undetected until audit time
✗Manual eval scripts are brittle, slow, and never run in CI
With Regrada
✓Every PR is gated on behavioral correctness, not just unit tests
✓Prompt and model changes are validated against real production traces
✓PII and compliance policies enforced automatically on every run
✓Test cases live in your repo, reviewed in your PRs, run in your CI
Built for Production AI
Everything you need to test, validate, and continuously monitor LLM behavior in real systems.
Zero-Code Traffic Capture
Point your app at Regrada's proxy and every LLM call is recorded automatically — no SDK changes, no instrumentation, no configuration. Works with existing apps on day one.
Supported Providers
Native support for OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, and a built-in mock provider for local smoke tests. Swap supported models without rewriting your tests.
Intelligent Policy Engine
Catch regressions that unit tests miss. Enforce assertions, variance thresholds, refusal rates, PII presence, latency budgets, and JSON schema compliance — all defined as code.
Automatic PII & Secrets Redaction
Sensitive data is stripped from traces before they ever leave your environment. Ship test artifacts to CI without exposing customer data or API keys.
Native GitHub Actions Integration
Regrada runs as a CI step, posts results as PR comments, and blocks merges on policy violations. Catch regressions in the pull request, not in the post-mortem.
Web Dashboard
Visualize trace history, compare baselines side by side, and drill into failing assertions — all from a single dashboard. No more grepping through CI logs.
How It Works
From first trace to protected CI in four steps. No new infrastructure. No new programming model.
Record Real Traffic
Run regrada record — it starts an HTTPS proxy that intercepts and logs every LLM API call your app makes. Zero instrumentation required.
Accept into Test Cases
Run regrada accept to promote recorded traces into version-controlled YAML test cases with baseline snapshots. Commit them alongside your code.
Run Evaluations
Run regrada test to replay cases, diff outputs against baselines, and evaluate every configured policy — assertions, variance, PII, latency, and more.
Gate Your CI
Add regrada test to your GitHub Actions workflow. Regressions block the merge. Results post as a PR comment. The rest of your pipeline stays untouched.
Tests Pass
All checks successful, ready to deploy
Tests Fail
Regression detected, review changes
Who It's For
Built for teams that ship AI features and need to know when they break.
AI Product Teams
Ship prompt updates and model upgrades with the same confidence as a code deploy.
Platform Engineers
Add LLM behavior gates to your existing CI/CD pipeline without changing your stack.
ML Engineers
Regression-test fine-tuned models and evaluate output quality against real baselines.
Compliance-Conscious Teams
Enforce PII policies, redact sensitive traces, and produce audit-ready test reports automatically.
If AI is in your critical path, it deserves the same rigor as your code.
Why Regrada
• Unit tests assert that your code runs. They say nothing about whether your AI still behaves correctly after a model bump or prompt edit.
• LLM regressions are invisible in logs and silent in metrics — they show up as churn, support tickets, and damaged trust.
• The fastest way to catch a regression is in the pull request. Regrada puts that check exactly there — automated, repeatable, and zero overhead.
Regrada is the safety net your AI pipeline is missing.
Pricing
Start free and scale as you grow. No hidden fees.
Starter
For individual developers exploring Regrada
Team
For small teams running AI features in production
Scale
For companies scaling CI and AI workflows
Your AI deserves a test suite.
Start catching behavioral regressions in CI — before your users catch them in production.
Get notified about new features, provider support, and early access