Regrada Documentation

CI gate for LLM behavior — record real model traffic, turn it into test cases, and block regressions in CI.

> Records LLM API calls via an HTTP proxy (regrada record)

> Converts recorded traces into portable YAML cases + baseline snapshots (regrada accept)

> Runs cases repeatedly, diffs vs baselines, and enforces configurable policies (regrada test)

> Produces CI-friendly reports (stdout summary, Markdown, JUnit) and a GitHub Action

Installation

macOS / Linux

curl -fsSL https://regrada.com/install.sh | sh

The installer downloads a prebuilt binary and installs it to ~/.local/bin/regrada. If regrada isn't found, add ~/.local/bin to your PATH.

Windows

The installer targets macOS/Linux. On Windows, run Regrada via WSL.

Build from source

mkdir -p bingo build -o ./bin/regrada ../bin/regrada version

Quick Start (Local)

1. Initialize config + example case

regrada init

2. Configure a provider (OpenAI)

export OPENAI_API_KEY="..."

Edit regrada.yml:

providers:
  default: openai
  openai:
    model: gpt-4o-mini

3. Set baseline mode to local

baseline:
  mode: local

4. Generate baselines and run tests

regrada baselineregrada test

Core Concepts

Cases

A case is a YAML file (default: regrada/cases/**/*.yml) containing a prompt (chat messages or structured input) plus optional assertions.

Assertions vs Policies

  • Case assertions (assert: in a case file) mark individual runs as pass/fail and feed metrics like pass_rate.
  • Policies (policies: in regrada.yml) decide what counts as a warning or error in CI.

Baselines

A baseline is a stored snapshot (golden output + aggregate metrics) used for regression checks.

Regrada stores baselines under the snapshot directory (default: .regrada/snapshots/), keyed by:

  • Case ID
  • Provider + model
  • Sampling params (temperature/top_p/max tokens/stop)
  • System prompt content

CLI Commands

regrada init

Creates regrada.yml, an example case, and runtime directories.

regrada init

Flags: --path, --force, --non-interactive

regrada record

Starts an HTTP proxy to capture LLM traffic (default: forward proxy with HTTPS MITM).

regrada recordregrada record -- python app.pyregrada record -- npm test

Recorded traces are written to .regrada/traces/ (JSONL) and sessions to .regrada/sessions/.

regrada accept

Converts traces from the latest (or specified) session into cases and baselines.

regrada acceptregrada accept --session .regrada/sessions/20250101-120000.json

regrada baseline

Runs all discovered cases once and writes baseline snapshots.

regrada baseline

regrada test

Runs cases, diffs against baselines, evaluates policies, and writes reports.

regrada test

regrada ca

Manages the local Root CA required for forward-proxy HTTPS interception.

regrada ca initregrada ca installregrada ca statusregrada ca uninstall

Configuration (regrada.yml)

Minimal working config for OpenAI:

version: 1

providers:
  default: openai
  openai:
    model: gpt-4o-mini

baseline:
  mode: local

policies:
  - id: assertions
    severity: error
    check:
      type: assertions
      min_pass_rate: 1.0

Providers

Implemented today:

  • openai (Chat Completions)
  • mock (returns "mock response")

Scaffolded but not implemented: anthropic, azure_openai, bedrock

Case Discovery

Defaults (can be overridden under cases:):

  • Roots: ["regrada/cases"]
  • Include globs: ["**/*.yml", "**/*.yaml"]
  • Exclude globs: ["**/README.*"]

Baseline Modes

Git baseline config (recommended for CI):

baseline:
  mode: git
  git:
    ref: origin/main
    snapshot_dir: .regrada/snapshots

Reports

Enable JUnit output for CI:

report:
  format: [summary, markdown, junit]
  junit:
    path: .regrada/junit.xml

Case Format

Example test case (regrada/cases/**/*.yml):

id: greeting.hello
tags: [smoke]

request:
  messages:
    - role: system
      content: You are a concise assistant.
    - role: user
      content: Say hello and ask for a name.
  params:
    temperature: 0.2
    top_p: 1.0

assert:
  text:
    contains: ["hello"]
    max_chars: 120

> request must specify either messages or input (a YAML map)

> Roles must be system, user, assistant, or tool

> assert.json.schema and assert.json.path are parsed/validated but not enforced yet by the runner

Policies

Policies are how you turn runs/diffs into CI gates. Common setup:

policies:
  - id: assertions
    severity: error
    check:
      type: assertions
      min_pass_rate: 1.0

  - id: no_pii
    severity: error
    check:
      type: pii_leak
      detector: pii_strict
      max_incidents: 0

  - id: stable_text
    severity: warn
    check:
      type: variance
      metric: token_jaccard
      max_p95: 0.35

Supported Policy Types

assertions — validates case-level assertions pass rate

json_valid — ensures JSON output validity

text_contains — pattern matching (required phrases)

text_not_contains — negative pattern matching

pii_leak — detects PII leakage with configurable detectors

variance — controls output stability (token Jaccard similarity)

refusal_rate — monitors model refusal behavior

latency — P95 latency thresholds

json_schema — schema validation (scaffolded, not implemented yet)

Recording Workflow

Forward Proxy (Recommended)

1. Generate and trust the local CA:

regrada ca initregrada ca install

2. Run your app/tests through the proxy:

regrada record -- ./run-my-tests.sh

3. Convert the latest session into cases + baselines:

regrada accept

Reverse Proxy (No MITM)

Set capture.proxy.mode: reverse and point your LLM base URL at the proxy. This mode does not require installing the CA, but your application must be configurable to talk to the proxy instead of the upstream API.

Baselines in Git (Recommended for CI)

1. Version-control your snapshot directory

By default, regrada init adds .regrada/ to .gitignore. Un-ignore the snapshots directory:

.regrada/*
!.regrada/snapshots/
!.regrada/snapshots/**

2. Generate and commit snapshots on your baseline branch

regrada baselinegit add .regrada/snapshots regrada/cases regrada.ymlgit commit -m "Update Regrada baselines"

3. In PR branches/CI, run tests with git mode

Use baseline.mode: git and baseline.git.ref: origin/main.

GitHub Action

Example workflow configuration:

name: Regrada
on:
  pull_request:

jobs:
  regrada:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0 # required for baseline.mode=git

      - uses: regrada-ai/regrada@v1
        with:
          config: regrada.yml
          comment-on-pr: true
          working-directory: .

Action Inputs

InputDescriptionDefault
configPath to regrada.yml/regrada.yamlregrada.yml
comment-on-prPost .regrada/report.md as a PR commenttrue
working-directoryDirectory to run regrada test in.

Action Outputs

total — Total number of cases

passed — Number of passed cases

warned — Number of warned cases

failed — Number of failed cases

result success, warning, or failure

Exit Codes

regrada test uses exit codes to help CI distinguish failure modes:

0 — No failing policy violations

1 — Internal error (provider/report/etc.)

2 — Policy violations (as configured by ci.fail_on)

3 — Invalid config / no cases discovered

4 — Missing baseline snapshot

5 — Evaluation error (provider call failed, timeout, etc.)

Troubleshooting

"config not found"

Create regrada.yml by running regrada init or pass --config to specify a different path.

Exit code 4 / baseline missing

Run regrada baseline on your baseline ref and commit snapshots. Ensure CI fetches baseline.git.ref.

OpenAI auth errors

Set OPENAI_API_KEY or configure providers.openai.api_key in regrada.yml.

Recording HTTPS fails

Run regrada ca init + regrada ca install, and confirm capture.proxy.allow_hosts includes your provider host.