Recipe B · 04 · ~10 min · beginner

Smoke-test a config change.

A 90-second CI loop: generate 1k sequences, diff their key marginals against a saved baseline, fail loudly if a contract regresses or a distribution shifts beyond tolerance. Drops into a pytest fixture so every PR that touches the config gets its biology rechecked before merge.

01 Snapshot baseline stored marginals from a good run
02 Generate small panel · fixed seed
03 Assert contracts · KS · totals
PART 01

What a smoke-test is and isn't.

A smoke test is not a science experiment. It's a guard against subtle regressions — someone edits a config, you want to know the engine still produces in-frame sequences with roughly the right V-usage shape. Tolerances are loose; the goal is to catch obvious breakage in under two minutes.

Good smoke checks
  • productive ratestable across runs at the same seed — should match baseline within 2%
  • truth_v_call top-5 setthe 5 most common V calls shouldn't change identity
  • junction_aa length meana single summary stat — stable to ±1 amino acid
  • contract violationszero — any productive=False under ga.productive() is a bug
Bad smoke checks (saved for B · 03)
  • full V-usage KL divergencetoo sensitive to RNG; sit in the realism audit instead
  • motif distributionsexpensive; not necessary for catching regressions
  • per-record assertionsslow; smoke is about marginals
PART 02

The pytest fixture.

One file in your test suite. Same seed every run — outputs are bit-identical, so any change in the marginals comes from a code change, not RNG drift.

tests/test_smoke.py
import json, pytest
import GenAIRR as ga
from pathlib import Path

BASELINE = json.loads(Path("tests/baseline.json").read_text())

@pytest.fixture(scope="module")
def panel():
    return (
        ga.Experiment.on("human_igh")
          .recombine()
          .mutate(model="s5f", count=(0, 25))
          .run_records(
              n=1000,
              seed=42,
              respect=ga.productive(),
              expose_provenance=True,
          ).to_dataframe()
    )

def test_no_contract_violations(panel):
    assert panel["productive"].all()

def test_v_usage_stable(panel):
    top5 = set(panel["truth_v_call"]
                .str.split("*").str[0]
                .value_counts().head(5).index)
    assert top5 == set(BASELINE["v_top5"])

def test_cdr3_length_stable(panel):
    mean = panel["cdr3_aa"].str.len().mean()
    assert abs(mean - BASELINE["cdr3_mean"]) < 1.0
PART 03

Capturing the baseline.

First-time setup: run the panel once on a known-good build, save the summary marginals, commit them. The test then asserts every subsequent run reproduces them.

tools/capture_baseline.py
import json, GenAIRR as ga
from pathlib import Path

df = (
    ga.Experiment.on("human_igh")
      .recombine()
      .mutate(model="s5f", count=(0, 25))
      .run_records(n=1000, seed=42,
                   respect=ga.productive(),
                   expose_provenance=True)
      .to_dataframe()
)

baseline = {
    "v_top5": list(df["truth_v_call"].str.split("*").str[0]
                  .value_counts().head(5).index),
    "cdr3_mean": float(df["cdr3_aa"].str.len().mean()),
}
Path("tests/baseline.json").write_text(json.dumps(baseline, indent=2))
PART 04

When the test fails.

A failure means one of three things — and the first thing to do is figure out which.

1 · You changed config on purpose

Run capture_baseline.py again, commit the new baseline.json alongside the config change. Test passes on the next run.

2 · You broke RNG determinism

Same seed should produce same output. If the test fails on a change that shouldn't have touched sampling, you've probably altered the order of rng.draw() calls. Bisect to find the culprit.

3 · A contract regressed

If test_no_contract_violations trips, that's a real biology bug — a non-productive record snuck past ga.productive(). Investigate immediately before merging anything else.

Related recipes

Where to next.

B · 03 · Audit a simulation for biological realism →

When you need more than smoke — the full marginal panel against an empirical reference.

C · 03 · Replay with one knob changed →

Once the smoke test trips, replay specific records with only one parameter perturbed.