CI for AI systems

Detect behavioral regressions before they hit production. Automated testing for non-deterministic AI systems.

Regrada captures LLM interactions, runs evaluations against test cases, and detects when your AI's behavior changes between commits.

terminal

$ regrada test

Running test cases...

✓ greeting.hello

✓ refund.lookup

✗ customer.onboarding

Policy violation: assertions (min_pass_rate: 1.0)

Total: 3 | Passed: 2 | Failed: 1

What Regrada Does

LLMs are non-deterministic — they don't fail loudly, they change quietly. Regrada catches behavior changes before deployment.

>Record LLM API calls via HTTP proxy (regrada record)

>Convert traces into YAML test cases (regrada accept)

>Run cases against baselines and enforce policies (regrada test)

Without Regrada

Model updates break production

Prompt changes cause silent failures

No way to catch regressions early

Manual testing is slow & incomplete

With Regrada

Regressions caught in CI

Every change is validated

Automated behavioral testing

Ship with confidence

Core Features

Everything you need to test and validate your AI systems.

🔍

Policy-Based Detection

Configurable policies for assertions, PII detection, text variance, refusal rates, and latency thresholds. Define policies as code and enforce them in CI.

🧪

YAML Test Cases

Define test cases with structured inputs and assertions including text contains, max chars, and JSON schema validation. Portable files stored in your repo.

📊

HTTP Proxy Recording

Capture LLM API traffic with HTTPS MITM proxy. Records to JSONL with session metadata and redaction presets. Zero code changes required.

🚦

CI/CD Enforcement

First-class GitHub Actions integration with automatic PR comments, regression failures, and detailed test output. Works with any CI system.

🔌

Model-Agnostic

Automatically detects and captures calls to OpenAI, Anthropic, Azure OpenAI, Google AI, Cohere, Ollama, and custom endpoints.

🧠

Baseline Modes

Store and compare baselines flexibly with local filesystem snapshots or git refs. Baselines keyed by case, provider, model, and params.

How It Works

A simple workflow that integrates into your existing CI/CD pipeline.

1

Record Traffic

Run regrada record to capture LLM API calls via HTTP proxy

2

Accept Traces

Run regrada accept to convert recorded traces into YAML test cases and baseline snapshots

3

Run Tests

Run regrada test to execute cases, diff against baselines, and evaluate policies

4

Enforce in CI

Integrate with GitHub Actions to block merges on policy violations

Tests Pass

All checks successful, ready to deploy

Tests Fail

Regression detected, review changes

Who It's For

AI startups shipping fast

Teams running LLMs in production

Infra / platform engineers

Enterprises with compliance

If AI is part of your critical path, you need Regrada.

Why Regrada

Traditional tests can't catch LLM behavior changes

Model updates and prompt changes need the same rigor as code

Catching regressions in CI is faster and cheaper than debugging in production

Regrada makes AI systems testable and reliable.

Pricing

Start free and scale as you grow. No hidden fees.

Starter

Free

For individual developers exploring Regrada

Team

Popular
$29/month

For small teams running AI features in production

Scale

$99/month

For companies scaling CI and AI workflows

Test your AI like you test your code.

Catch behavioral regressions before they reach production.

Stay updated on new features and releases