Regrada Documentation
CI gate for LLM behavior — record real model traffic, turn it into test cases, and block regressions in CI.
> Records LLM API calls via an HTTP proxy (regrada record)
> Converts recorded traces into portable YAML cases + baseline snapshots (regrada accept)
> Runs cases repeatedly, diffs vs baselines, and enforces configurable policies (regrada test)
> Produces CI-friendly reports (stdout summary, Markdown, JUnit) and a GitHub Action
Installation
macOS / Linux
curl -fsSL https://regrada.com/install.sh | shThe installer downloads a prebuilt binary and installs it to ~/.local/bin/regrada. If regrada isn't found, add ~/.local/bin to your PATH.
Windows
The installer targets macOS/Linux. On Windows, run Regrada via WSL.
Build from source
mkdir -p bingo build -o ./bin/regrada ../bin/regrada versionQuick Start (Local)
1. Initialize config + example case
regrada init2. Configure a provider (OpenAI)
export OPENAI_API_KEY="..."Edit regrada.yml:
providers:
default: openai
openai:
model: gpt-4o-mini3. Set baseline mode to local
baseline: mode: local
4. Generate baselines and run tests
regrada baselineregrada testCore Concepts
Cases
A case is a YAML file (default: regrada/cases/**/*.yml) containing a prompt (chat messages or structured input) plus optional assertions.
Assertions vs Policies
- Case assertions (
assert:in a case file) mark individual runs as pass/fail and feed metrics likepass_rate. - Policies (
policies:inregrada.yml) decide what counts as a warning or error in CI.
Baselines
A baseline is a stored snapshot (golden output + aggregate metrics) used for regression checks.
Regrada stores baselines under the snapshot directory (default: .regrada/snapshots/), keyed by:
- Case ID
- Provider + model
- Sampling params (temperature/top_p/max tokens/stop)
- System prompt content
CLI Commands
regrada init
Creates regrada.yml, an example case, and runtime directories.
regrada initFlags: --path, --force, --non-interactive
regrada record
Starts an HTTP proxy to capture LLM traffic (default: forward proxy with HTTPS MITM).
regrada recordregrada record -- python app.pyregrada record -- npm testRecorded traces are written to .regrada/traces/ (JSONL) and sessions to .regrada/sessions/.
regrada accept
Converts traces from the latest (or specified) session into cases and baselines.
regrada acceptregrada accept --session .regrada/sessions/20250101-120000.jsonregrada baseline
Runs all discovered cases once and writes baseline snapshots.
regrada baselineregrada test
Runs cases, diffs against baselines, evaluates policies, and writes reports.
regrada testregrada ca
Manages the local Root CA required for forward-proxy HTTPS interception.
regrada ca initregrada ca installregrada ca statusregrada ca uninstallConfiguration (regrada.yml)
Minimal working config for OpenAI:
version: 1
providers:
default: openai
openai:
model: gpt-4o-mini
baseline:
mode: local
policies:
- id: assertions
severity: error
check:
type: assertions
min_pass_rate: 1.0Providers
Implemented today:
openai(Chat Completions)mock(returns "mock response")
Scaffolded but not implemented: anthropic, azure_openai, bedrock
Case Discovery
Defaults (can be overridden under cases:):
- Roots:
["regrada/cases"] - Include globs:
["**/*.yml", "**/*.yaml"] - Exclude globs:
["**/README.*"]
Baseline Modes
Git baseline config (recommended for CI):
baseline:
mode: git
git:
ref: origin/main
snapshot_dir: .regrada/snapshotsReports
Enable JUnit output for CI:
report:
format: [summary, markdown, junit]
junit:
path: .regrada/junit.xmlCase Format
Example test case (regrada/cases/**/*.yml):
id: greeting.hello
tags: [smoke]
request:
messages:
- role: system
content: You are a concise assistant.
- role: user
content: Say hello and ask for a name.
params:
temperature: 0.2
top_p: 1.0
assert:
text:
contains: ["hello"]
max_chars: 120> request must specify either messages or input (a YAML map)
> Roles must be system, user, assistant, or tool
> assert.json.schema and assert.json.path are parsed/validated but not enforced yet by the runner
Policies
Policies are how you turn runs/diffs into CI gates. Common setup:
policies:
- id: assertions
severity: error
check:
type: assertions
min_pass_rate: 1.0
- id: no_pii
severity: error
check:
type: pii_leak
detector: pii_strict
max_incidents: 0
- id: stable_text
severity: warn
check:
type: variance
metric: token_jaccard
max_p95: 0.35Supported Policy Types
assertions — validates case-level assertions pass rate
json_valid — ensures JSON output validity
text_contains — pattern matching (required phrases)
text_not_contains — negative pattern matching
pii_leak — detects PII leakage with configurable detectors
variance — controls output stability (token Jaccard similarity)
refusal_rate — monitors model refusal behavior
latency — P95 latency thresholds
json_schema — schema validation (scaffolded, not implemented yet)
Recording Workflow
Forward Proxy (Recommended)
1. Generate and trust the local CA:
regrada ca initregrada ca install2. Run your app/tests through the proxy:
regrada record -- ./run-my-tests.sh3. Convert the latest session into cases + baselines:
regrada acceptReverse Proxy (No MITM)
Set capture.proxy.mode: reverse and point your LLM base URL at the proxy. This mode does not require installing the CA, but your application must be configurable to talk to the proxy instead of the upstream API.
Baselines in Git (Recommended for CI)
1. Version-control your snapshot directory
By default, regrada init adds .regrada/ to .gitignore. Un-ignore the snapshots directory:
.regrada/* !.regrada/snapshots/ !.regrada/snapshots/**
2. Generate and commit snapshots on your baseline branch
regrada baselinegit add .regrada/snapshots regrada/cases regrada.ymlgit commit -m "Update Regrada baselines"3. In PR branches/CI, run tests with git mode
Use baseline.mode: git and baseline.git.ref: origin/main.
GitHub Action
Example workflow configuration:
name: Regrada
on:
pull_request:
jobs:
regrada:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # required for baseline.mode=git
- uses: regrada-ai/regrada@v1
with:
config: regrada.yml
comment-on-pr: true
working-directory: .Action Inputs
| Input | Description | Default |
|---|---|---|
| config | Path to regrada.yml/regrada.yaml | regrada.yml |
| comment-on-pr | Post .regrada/report.md as a PR comment | true |
| working-directory | Directory to run regrada test in | . |
Action Outputs
total — Total number of cases
passed — Number of passed cases
warned — Number of warned cases
failed — Number of failed cases
result — success, warning, or failure
Exit Codes
regrada test uses exit codes to help CI distinguish failure modes:
0 — No failing policy violations
1 — Internal error (provider/report/etc.)
2 — Policy violations (as configured by ci.fail_on)
3 — Invalid config / no cases discovered
4 — Missing baseline snapshot
5 — Evaluation error (provider call failed, timeout, etc.)
Troubleshooting
"config not found"
Create regrada.yml by running regrada init or pass --config to specify a different path.
Exit code 4 / baseline missing
Run regrada baseline on your baseline ref and commit snapshots. Ensure CI fetches baseline.git.ref.
OpenAI auth errors
Set OPENAI_API_KEY or configure providers.openai.api_key in regrada.yml.
Recording HTTPS fails
Run regrada ca init + regrada ca install, and confirm capture.proxy.allow_hosts includes your provider host.